Module 01

Module 01 portfolio check

  • Installation check
    • Completion status:
    • Comments:
  • Portfolio repo setup
    • Completion status:
    • Comments:
  • RMarkdown Pretty html Challenge
    • Completion status:
    • Comments:
  • Evidence worksheet_01
    • Completion status:
    • Comments:
  • Evidence worksheet_02
    • Completion status:
    • Comments:
  • Evidence worksheet_03
    • Completion status:
    • Comments:
  • Problem Set_01
    • Completion status:
    • Comments:
  • Problem Set_02
    • Completion status:
    • Comments:
  • Writing assessment_01
    • Completion status:
    • Comments:
  • Additional Readings
    • Completion status:
    • Comments

Data science Friday

Installation check

Image title Image title Image title

Portfolio repo setup

Code you used to create, initialize, and push a portfolio repo to GitHub:

Set up Git user locally

git config –global user.name $USERNAME git config –global user.email $EMAIL

Creating local repository

git init git add . git commit -m “First commit” git remote add origin https://remote_repository_URL git remote -v git push -u origin master

To push my portfolio repo to GitHub for updates:

git add . git commit -m “First commit” git push

### RMarkdown pretty html challenge

title: “Pretty_html” author: “Alison Fong 33399149” date: “version April 25, 2018” output: html_document: toc: yes — #R Markdown PDF Challenge The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online. Image title

http://phdcomics.com/ Comic posted 1-17-2018

Challenge Goals

The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)

hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown

Here’s a header!

Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).

Another header, now with maths

Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:

1231521+12341556280987
## [1] 1.234156e+13

Table Time

Or maybe, after you’ve added those numbers, you feel like it’s about time for a table!
I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.

library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
I made this table with kable in the knitr package library
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!

YAAAAY

YAAAAY

Plotting Data in R

Exercise 1

library(tidyverse)
## ── Attaching packages ────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.8.0     ✔ stringr 1.2.0
## ✔ readr   1.1.1     ✔ forcats 0.2.0
## ── Conflicts ───────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(phyloseq)
library(dplyr)

metadata_new = read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t", na.strings="NAN")
OTU_new = read.table(file="Saanich.OTU.txt", header=TRUE, row.names=1, sep="\t", na.strings="NAN")
load("phyloseq_object.RData")

ggplot(metadata_new, aes(x=PO4_uM, y=Depth_m)) +
  geom_point(shape=17, color="purple")

Exercise 2

metadata_new %>% 
  select(matches("Temp"))
##              Temperature_C
## SI072_S3_010        12.854
## SI072_S3_020        11.005
## SI072_S3_040         9.536
## SI072_S3_060         8.540
## SI072_S3_075         8.480
## SI072_S3_085         8.538
## SI072_S3_090         8.599
## SI072_S3_097         8.647
## SI072_S3_100         8.703
## SI072_S3_110         8.727
## SI072_S3_120         8.796
## SI072_S3_135         8.882
## SI072_S3_150         9.002
## SI072_S3_165         9.041
## SI072_S3_185         9.091
## SI072_S3_200         9.117
metadata_new %>% 
  mutate(Temperature_F = Temperature_C*1.8+32) %>%
  ggplot() + geom_point(aes(y=Depth_m, x=Temperature_F))

Exercise 3

physeq_percent = transform_sample_counts(physeq, function(x) 100 * x/sum(x))
plot_bar(physeq_percent, fill="Genus") + 
  geom_bar(aes(fill=Genus), stat="identity") + ggtitle("Genus Percentages") + xlab("Sample Depth") + ylab("Percent Relative Abundance")

Exercise 4

table_5_1= metadata_new %>% select(Depth_m, O2_uM, PO4_uM, SiO2_uM, NO3_uM, NH4_uM, NO2_uM)
table_5_2= table_5_1 %>% gather (Nutrients, Concentration, O2_uM, PO4_uM, SiO2_uM, NO3_uM, NH4_uM, NO2_uM)
ggplot(table_5_2, aes(x=Depth_m, y=Concentration)) +
  geom_point() + geom_line()+ facet_wrap(~Nutrients, scales="free_y") +
  theme(legend.position="none")

Origins and Earth Systems

Evidence worksheet 01

Whitman et al 1998

Learning objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General questions

  • What were the main questions being asked?

What is the number of prokaryotes and the total amount of their cellular carbon on Earth?

  • What were the primary methodological approaches used?

Aquatic Environments: estimates of cell density, volume, and carbon Soil: estimates from detailed direct counts from representative soils, estimates from other papers/unpublished field studies of EA Paul for cultivated soils Subsurface: calculation of arithmetic averages to create a depth profile, extrapolation from formula of published papers

2 other approaches: (1) Assuming average porosity of the terrestrial subsurface is 3% (2) Estimation from groundwater data

Other Habitats: For animals - using # of prokaryotes in each individual animal and the population size of the animal For leaves - leaf area estimated from leaf area index; assuming a dense population For air - estimates from references

To estimate carbon content in prokaryotes: estimations using cell numbers; using average dry weight of cells; average cellular carbon

  • Summarize the main results or findings.

Number of prokaryotes is estimated to be 4-6 x 10^30 cells Prokaryotes’ cellular carbon on Earth is estimated to be 350-550 Pg of C Total amount of prokaryotic carbon = 60-100% of the estimated total carbon in plants, so inclusion of prokaryotic carbon in global models will almost double estimates of the amount of carbon stored in living organisms

Earth’s prokaryotes contain 85-130 Pg of N and 9-14 Pg of P

Number of prokaryotes in: (1) Open ocean: 1.2 x 10^29 cells (2) Soil: 2.6 x 10^29 cells (3) Oceanic subsurfaces: 3.5 x 10^30 cells (4) Terrestrial subsurfaces: 0.25-2.5 x 10^30 cells

Average turnover times of heterotrophic prokaryotes in: (1) Upper 200 m of open ocean: 6-25 days (2) Ocean below 200 m: 0.8 year (3) Soil: 2.5 years (4) Subsurface: 1-2 x 10^3 years

Cellular production rate for all prokaryotes on Earth is estimated to be 1.7 x 10^30 cells per year; highest in open ocean

  • Do new questions arise from the results?

How do carbon content in prokaryotes interact with carbon content from the environment?

How is carbon from relatively inaccessible sources cycled through the carbon cycle?

How does prokaryotic abundance play a role in the total metabolic potential of the ecosystem? What is their significance?

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

No supplemental figures were included; math was not shown. Tables were easy to understand, but I wasn’t sure how or where they got the numbers from. A lot of the methods were calculations based on assumptions (see methods question above). Enough evidence were supplemented to reach the conclusion; it was more of a question of whether the evidence they based it on were valid/accurate. Indeed, papers were referenced, but we might have to go through each individual referenced paper to convince ourselves their values were accurate and reliable. I was a little confused on the schematics of how different habitats/depths of the earth are linked together; it may be helpful to provide a depth diagram or a habitat diagram to show exactly which areas of the earth we are talking about/calculating for.

Problem set 01

Whitman et al 1998 #### Learning objectives: Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific questions:

  • What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.

From Table 5: Aquatic habitats - 1.2 x 10^29 cells (From Table 1, large population ≠ cell density) Oceanic subsurface - 3.55 x 10^30 cells Soil - 2.6 x 10^29 cells Terrestrial subsurface - 25-250 x 10^28 cells

  • What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?

Estimated prokaryotic cell abundance in upper 200 m = 3.60 x 10^28 Upper 200 m - cellular density ~ 5 x 10^5 cells/ml Proclorococcus - celluar density ~ 4 x 10^4 cells/ml Autotrophs = 2.9 x 10^27 cells (4x104)/(5x105) x 100 = 8% Prochlorococcus = there must be high turnover of Prochlorococcus in order to support the carbon that’s cycling in the ocean b/c only 8% of prokaryotic cells in the upper 200 m of the ocean are Prochlorococcus; Prochlorococcus is the main source of carbon in the ocean

  • What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?

Autotroph - uses inorganic carbon as carbon source (incorporated into their own cells, not just used as part of metabolism; Ex. CO2); self-nourishing; fix inorganic carbon (CO2) into biomass Heterotroph - uses organic carbon as carbon source (incorporated into their own cells, not just used as part of metabolism); assimilate organic carbon Lithotroph - uses inorganic chemicals (i.e. minerals, irons) as e- source; use inorganic substrates; assimilates and metabolizes inorganic substrates and releases energy

  • Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?

Deep habitats supporting life: Subsurface —> terrestrial = 4 km; marine = deepest subsurface is 9-10 km from sea level Primary limiting factor = temperature (avg temperature at this depth is 125˚C, which is the close to the upper temperature limit for prokaryotic life); ∆˚C ~ 22˚C/km

  • Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?

Atmospheric —> 57-77 km above sea level (realistic boundary = 20 km above sea level) Primary limiting factor = nutrients and temperature; very humid with UV ionization in upper atmosphere

  • Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?

Assuming top of Mt. Everest as the top boundary and 4km below subsurface as the lower boundary = 8.8 km + 10 km + 4km =~ 23 km

  • How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)

Population size (divided by) turnover time days/(365 days/yr) = cells/year Ex. In marine heterotrophs: (3.6x10^28)/(16/365) = 8.2 x 10^29 Viruses carry accessory metabolic genes; these are protein encoding genes that play a role in cytometabolism (they’re not just there for viral replication, but they can also influence metabolic network within cell and essentially reprogram the cell). Thus, cells are information circuit boards that can actually be reprogrammed by viruses!

  • What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?

Carbon assimilation efficiency is assumed to be 0.2 (or 20%) in the paper (held constant) —> amount of “net productivity” necessary to support turnover of prokaryotes in the upper 200 m of the ocean is 4 times their carbon content or 0.7-2.9 Pg of C —> assuming 85% of net productivity is consumed in the upper 200 m and assuming all this carbon is used by prokaryotes, average turnover rate cannot exceed 15-60 yr^-1

To calculate, we need to know total # of cells and the total # of C/cell Total C/cell = 20 fg C/cell —> average = 10 fg C/cell = 10^-30 Pg/cell Total # of cells = 3.6 x 10^28 cells 3.6x10^28 cells x 10^-30 Pg/cell = 0.72 Pg C in marine heterotrophs Used a multiplier of 4 in the paper —> 4 x 0.72 = 2.88 Pg C/year (that’s the turnover rate of C) 51 Pg C/year —> 85% is consumed, thats ~ 43 Pg C/year (43 Pg C/year)/(2.88 Pg C/year) = 14.9 or 1 turnover every 24.5 days

  • How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)

4x10^-7 mutations/generation Take it to the power of 4 (for 4 simultaneous mutations) —> 2.56 x 10^-26 mutations/generation We need to know the turnover rate (how quickly the cells generate themselves; how many generations per year?) —> 3.6x10^28 cells; 365 days/16 days (this is the turnover rate) = 22.8 turnovers/year (3.6x10^28 cels/year) x 22.8 = 8.2x10^29 cells/year in ocean (8.2x10^29 cells/year) x (2.56x10^-26 mutations/generation) = 2.1x10^4 mutations/year = 0.4 hours/mutation

4 mutations simultaneously is rare , but this calculation shows that this is still occurring frequently (point mutations). There is a whole other mobile aspect to microbial genome that is drastically more rapid than even this background mutation rate; when you have large population sizes, almost anything is possible

  • Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?

The large population size and high mutation rate of prokaryotic cells suggest that prokaryotic cells are able to adapt and evolve quickly. An example of this phenomenon is observed in antibiotic resistance. Besides point mutations, microbial genomes are also diversified via insertion, deletion, recombination, horizontal gene transfer (in which new genes are aquired) and epigenetics (in which the environment turns genes on and off). These efforts not only allow prokaryotes to be extremely diverse, but it also allows them to become adaptable to virtually any kind of environmental habitat.

  • What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?

High prokaryotic abundance suggests that prokaryotic metabolic potential may have an extensive impact on Earth. High prokaryotic diversity suggests that there may be a wide variety of metabolic potential/capacities.

Evidence Worksheet 02

Kasting & Siefert, 2002 #### Learning objectives: Comment on the emergence of microbial life and the evolution of Earth systems

  • Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.

Hadean + 4.6 billion years ago - formation of solar system, inner planets received water vapor and carbon

+ 4.5 billion years ago - formation of moon, which gave Earth its spin and tilt, day/night cycles and seasons

+ 4.5 - 4.1 billion years ago - high levels of CO2 increased temerapture during the hours of the weak, early sun 

+ 4.4 billion years ago - formation of Zircon (the oldest mineral)

+ 4.4 - 4.1 billion years ago - meterorite impacts

+ 4.1 billion years ago - first evidence of life in graphite in Zircon and carbon isotopes

+ 4.0 billion years ago - Acasta gneiss (oldest rock, from Canada) and evidence of plate subduction

Archaean

+ 3.8 billion years ago - bombardments halted, oceans formed, evidence of life from sedimentary rocks and methanogenesis

+3.75 billion year ago - evidence of photosynthesis

+ 3.5 billion years ago - microfossils and stromatolies present, global oxygenic photosynthesis, evidence of isotopic partition of C from carbonate (most likely Rubisco)

+ 3.5 - 2.7 billion years ago - cyanobacteria photosynthesize

+ 2.7 billion years ago - great oxidation event (responsible for glaciation)

Proterozoic

+ 2.5 - 1.5 billion years ago - red rock beds containing iron oxide observed, evidence of oxidation

+ 1.7 billion years ago - appearance of eukaryotes

+ 1.1 billion years ago - Snowball Earth occurs

Phanerozoic

+ 540,000 years ago - Cambrian explosion (increased diversity of life and larger organisms/land plants observed)

+ 400,000 years ago - Devonian explosion (fish, cephalopods, corals observed)

+ 250,000 years ago - Permian extinction (95% of species go extinct), gigantism of organisms

+ 240,000 years ago - evidence of dinosaurs

+ 200,000 years ago - Triassic Jurassic extinction 

+ 65,000 years ago - Cretaceous/Paleogene extinction
  • Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:

    • Hadean - abundance of CO2 present in minerals to keep the Earth warm since the sun was weak back then; Earth was mostly molten rock due to constant bombardment (generally hot and dry; oceans couldn’t form)

    • Archean - abundance of CH4 in atmosphere to keep the Earth warm; oceans were able to form, but early sun is still dim (~30% dimmer); methanogenesis kept Earth from freezing; some O2 present due to evolution of photosynthesis

    • Proterozoic - oxygenated atmosphere; in the atmosphere, O2 reacted with CH4 to produce CO2, which lead to a net decrease in greenhouse gas effects, causing glaciation on Earth; O2 oxidized iron into banded iron formations as seen in sedimentary rock

    • Phanerozoic - very high O2 levels in atmosphere correlated to gigantism; plants started to evolve; coal deposits developed from dead organsims during extinctions were stored in sediments; occasional glaciation

Problem set 02

Falkowski et al 2008 #### Learning objectives: Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

Specific Questions:

  • What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?

Primary geophysical processes include: Plate tectonics and atmospheric photochemical procesess, which allow the redistribution and reaction between different chemical species and nutrients (geochemical cycles). Primary biogeochemical processes include: Geochemical reactions based on acid/base chemistry; rock weathering, which drives nutrient cycles on Earth (Ex. By removing CO2 to allow cellular respiration); volcamism and microbial-catalyzed redox reactions, which are important for the cycling of major bioelements C, H, O, N, S, and P. Abiotic processes are mainly acid-base reactions based on H ions and mainly affect C, S, P levels. Biotic processes are mainly redox reactions based on electrons and mainly affect C, H, O, N, S levels.

  • Why is Earth’s redox state considered an emergent property?

Earth’s redox state is considered an emergent property because abiotic and biotic processes, and microbial metabolic and geochemical processes, feedback and loops to create an overall redox condition of the Earth’s oceans and atmospheres. These different processes and organisms work together to complete metabolic pathways and allow the production of major cycles, such as the carbon cycle or the nitrogen cycle.

  • How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?

As organisms live in close proximity to each other in communities and populations, electrons may be passed among different taxonomic groups via shared metabolites, nutrients, and wastes, thus giving rise to element and nutrient cycles. To overcome thermodynamic barriers to reversible electron flow, microbes may reduce substrate concentrations to favor the production of substrates to maintain equilibrium, by Le Chatelier’s Principle. Microbes may also work with other organisms, in which one organism provides the energy or products that can be used by another organism to carry out the reverse reaction. Microbes may also create an environment that favors the reverse reaction. For example, methane oxidizing organisms favor methane oxidation in the presence of hydrogen-consuming sulfate reducers, which keep hydrogen concentration low. In the case where thermodynamic conditions are unfavorable for the microbe, the overall metabolic pathway may still be present, although on a different level.

  • Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?

The nitrogen cycle invovles many different redox niches and microbial groups, some of which are listed here: - Nitrogen fixation changes N2 into NH4 so nitrogen becomes accessible to organisms for the synthesis of proteins and nucleic acids (nitrogenase is inhibited by O2) - In the presence of O2, NH4 is oxidized to NO2 or NO3 by nitrifying bacteria. During these reactions, CO2 is also often reduced into organic matter - In the absence of O2, NO2 or NO3 are used as electron acceptors, in which NO2 or NO3 are anaerobically reduced to N2 by microbes

There is an interdependent relationship between the nitrogen cycle and climate change. On one hand, nitrifying bacteria, when oxidizing NH4, also reduces atmosopheric CO2 through carbon fixation and assimilation. This reduces the greenhouse effect. On the other hand, climate change - such as changes in sunlight availability - affects the nitrogen cycle by changing the activity levels of photosynthetic organisms that use nitrogen oxides as TEAs.

  • What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?

An increase in metabolic diversity is often, but not always, correlated with an increase in microbial diversity, since the presence of certain metabolic processes does not always indicate the presence of specific microbes. Horizontal gene transfer allows microbes of different species to transfer genes and metabolic pathways among each other, and selective environmental pressures allow for the retention and expression of these newly transfered genes. As new genes are introduced to microbes and environmental pressures select for their expression, new protein families arise from these microbial community genomes and the community’s metabolic diversity increases, although the microbial diversity may not necessarily increase.

  • On what basis do the authors consider microbes the guardians of metabolism?

Due to microbial methods of sharing/transfering genes (such as horizontal gene transfer) and environmental selection for the retention of these genes (boutique genes), microbes may be considered the guardians of metabolism, since these mechanisms allow fundamental metabolic processes to be widespread in many species of microbes as microbes act as ferries that “protect”" these metabolic processes through different environmental hardships and through long periods of time. Extinction of individual microbial species will not threaten metabolic diveristy, since there tends to be other microbial species that could still provide reactions to complete the metabolic pathway and continue the survival of the core metabolic gene sets.

Evidence Worksheet 03

Waters et al 2016

Learning objectives

Evalulate human impacts on the ecology and biogeochemistry of Earth systems

General questions

  • What were the main questions being asked?

What are some anthropogenic markers of functional changes in the Earth system in the stratigraphic record that render the Anthropocene stratigraphically distinct from the Holocene and earlier epochs?

  • What were the primary methodological approaches used?

As this article is a review, it did not include any methodological approaches used by the primary literature whose findings it summarized.

  • Summarize the main results or findings.

The driving human forces responsible for many anthropogenic signatures are a product of three factors: accelerated technological development, rapid increase of the human population, and increased consumption of resources. Recent anthropogenic deposits in the stratigraphic record include pottery, glass, bricks, copper alloys, elemental aluminum, concrete, new organic polymers (plastics), black carbon, inorganic ash spheres, spherical carbonaceous particles. Distinct geochemical signatures introduced by human activities into the sedimentary record include increased concentrations of polyaromatic hydrocarbons, polychlorinated biphenyls, and diverse pesticide residues. Nitrogen and phosphorus in soils have doubled in the past century due to increased fertilizer usage. Industrial metals such as cadmium, chromium, copper, mercury, nickel, lead, and zinc show a global pattern of dispersion in the environment. The start of the Anthropocene may be defined by GSSA coinciding with nuclear weapons testing, as shown by elevated levels of radioisotops C-14 and Pu-239. There also appears to be an increase in temperature, an increase in O-18 ratio in ice in Greenland and increasing global sea levels that are significantly higher than Holocene levels. Evolution and extinction rates are too slow to provide an obvious biological marker for the start of the Antrhopocene, but species assemblages and relative abundances have significantly altered worldwide due to trends of habitat loss and overexploitation; if these trends are maintained, it would likely push Earth into the sixth mass extinction event.

  • Do new questions arise from the results?

How should the Anthropocene be defined? By the GSSA (calendar age), or by the GSSP (reference point in a stratal section), or by a combination of both? When does the Anthropocene formally begin? Is it helpful to formalize the Anthropocene, or is it better to leave it as an informal geological time term, as the Precambrian and Tertiary currently are?

  • Were there any specific challenges or adventages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidnece, were the figures or tables useful and easy to understand)?

I thought the purpose of the review was to identify specific antrhopological evidence in the stratiographic record that would support marking the Anthropocene as distinct from earlier epochs; however, the paper also went into detail regarding what made the Holocene distinct, which I thought was a little off topic, though it was interesting. The review also went into biological evidence which I thought was a little off topic, since I got the impression that the review was mainly focused on stratiographic and geographical evidence. The graphs were all easy to understand and relevant to the discussion.

Writing assessment 01

“Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformation it provides.” Do you agree or disagree with this statement? Answer the question using specific references to your reading, discussions and content from evidence worksheets and problem sets.

Introduction

In the last 500 million years, five mass extinctions have occurred in which more than three-quarters of all living species have ceased to exist. Microbes, on the other hand, have existed on Earth since 3 billion years ago long before the beginning of human evolution, and still remains with us on this Earth today, for good reason. Besides their extraordinary ability to adapt to harsh and extreme selective environmental conditions - even through global glaciation events and meteorite bombardments - which, in conjunction with mutations and horizontal gene transfer, inherently contributed to their taxonomic and metabolic diversity, microbes are responsible for setting the stage for essentially the existence of all living creatures since their time. In other words, we owe our existence to these innumerable invisible life forms that produce the air we breathe, the food we eat, and decompose the waste we produce. In this essay, I argue that humans are unable to survive without the global catalysis and environmental transformation microbes provide due our inability to replace their regulation of geochemical processes and waste cycles, as well as our inability to deal with environmental change without them.

Microbes and Geochemical Cycles

Microbes are the main biological drivers of our planet’s major geochemical cycles. Falkowski et al. describes the redox state of the Earth as an emergent property of microbial activity, since the fluxes of five of the six major elements - C, H, N, O, and S (which are the major elemental make-up for all biological macromolecules) - are mainly driven by redox reactions catalyzed by microbes (2008). This is not to suggest that any one species of microbe completes an entire metabolic pathway on their own; rather, multiple species of microbes (often spatially or temporally separated) usually work together to complete a complex pathway. Using the nitrogen cycle as an example, atmospheric nitrogen N2 is only made accessible for the synthesis of amino acids and nucleic acids via nitrogen fixation, a biological process that reduces N2 to NH4+. In anaerobic environments, this process is catalyzed by a highly conserved enzyme complex, nitrogenase, found in cyanobacteria. In aerobic environments following the reduction of N2 to NH4+, a specific group of Bacteria or Archaea oxidizes NH4+ to NO2-, which is then oxidized to NO3- by a different group of Bacteria. In the absence of oxygen, a different group of microbes uses NO2- or NO3- as electron acceptors for respiratory pathways, producing N2 as a by-product and thus closing the nitrogen cycle. Furthermore, it is important to note that microbes are required for decomposition and recycling of nutrient waste, including nitrogenous waste, in order to release nitrogen back into its cycle. Microbes do this through various processes such as fermentation and methanogenesis.

Oxygen is a required element for human survival, as it is used as a terminal electron acceptor in the electron transport chain during respiration for the synthesis of ATP. Unsurprisingly, microbes are responsible for first producing the atmospheric oxygen we breathe in today. According to Kasting and Siefert, cyanobacteria are believed to be responsible for the initial rise of atmospheric oxygen around 2.3 billion years ago, during the Great Oxidation Event (2002). It was cyanobacteria that was responsible for first “inventing” oxygenic photosynthesis - the process all humans rely on to live and breathe today. It is easy to see how we are dependent on the geochemical cycles driven by microbial redox reactions. If our waste cannot be metabolized to usable sources of nitrogen, our crops won’t grow; and if our crops and trees won’t grow, photosynthesis by microbes alone will not be able to support the oxygenic needs of all of humanity.

Microbes and the Anthropocene

The beginning of the Industrial Revolution is believed to have marked the beginning of a new epoch - the Anthropocene, an age during which human-driven ecological and environmental changes are marked distinct from the Holocene and previous epochs (Waters et al., 2016). Due to a rapid rise in technological advancement, industrialization, agriculture, and energy use, human activity has changed the face of the environmental landscape. Between 1999 and 2010, atmospheric CO2 was released into the atmosphere 100 times faster than the quickest emission during the last glacial meltdown; concentrations now sit at 400 ppm, levels that well exceed those of the Holocene epoch (Waters et al., 2016). Distinct geochemical signatures introduced by human activities into the sedimentary record include increased concentrations of polyaromatic hydrocarbons, polychlorinated biphenyls, and diverse pesticide residues (Waters et al., 2016). Nitrogen and phosphorus in soils have doubled in the past century due to increased fertilizer usage; and industrial metals such as cadmium, chromium, copper, mercury, nickel, lead, and zinc now show a global pattern of dispersion in the environment (Waters et al., 2016). CH4 and N2O have been increasing in concentration recently due to agricultural activities, which contribute to the atmospheric greenhouse effect and ultimately, global warming (Kasting & Siefert, 2002). It is projected that if greenhouse gases continues to be released into the atmosphere at this rate, the Earth will be the hottest it has ever been since the emergence of the human species 200,000 years ago by 2070 (Waters et al., 2016). There already appears to be an increase in temperature and an increase in global sea levels that exceed those of the Holocene epoch (Waters et al., 2016). It would be hopeless for humans to survive without a solution to the environmental issues we’ve caused. If these trends continue, it is likely Earth will undergo a sixth mass extinction event.

Microbes, Us, and Sixth Mass Extinction

The genetic basis responsible for microbes’ ability to drive metabolic pathways that regulate geochemical cycles did not evolve abruptly, or instantaneously out of nowhere; rather, it is through many years of mutation, horizontal gene transfer, and extensive environmental selection that these diverse genes and metabolic processes have arisen (Falkowski et al., 2008). Prokaryotes, unlike Eukaryotes, are mostly haploid and asexual, and thus hold a greater capacity to make use of mutations as a genetic source of metabolic diversity that may even become biochemical solutions to human-driven environmental issues in the future (Whitman et al., 1998). Keeping in mind the complexity of metabolic pathways driven by multiple microbes and the constant arising source of new metabolic pathways through mutations, it would be a long time before humans are able to understand microbial activity enough to completely replace them. Any previous attempts to replace microbial activity, such as our attempt to use the Haber-Bosch process to replace nitrogen fixation, have been met with the question of where the substrates, such as CO2, came from. Should humanity attempt to use the Haber-Bosch process to fix all the Earth’s nitrogen without the help of microbes, our source of CO2 would be depleted in a matter of decades. It is apparent that humans won’t be able to survive just by replacing one aspect of a geochemical cycle; rather, we must be able to replace each and every metabolic step before we could even consider surviving without microbes. Furthermore, it’s no surprise that as human-driven global warming, habitat loss, and overexploitation exacerbates, it would be crucial for us understand and use the biogeochemical processes of microbes to prevent what Dan Rothman calls Earth’s “sixth mass extinction”.

Conclusion

3 billion years ago, microbes came into being to bring about an oxygenic environment on Earth on which living things can thrive and expand. Today, microbes have not only evolved through extreme environmental historical pressures to preserve and “invent” new metabolic processes, but they also continue to maintain the balance in global biogeochemical cycles. As the Anthropocene epoch continues - during which forests may be destroyed to produce greenhouse gases and poor agricultural methods may leave traces to change the balance of the environment it is likely for Earth to head towards a sixth mass extinction, unless our current way of life changes. However, microbes have proved to us their resilience through time via mechanisms such as mutations and horizontal gene transfer. Not only are they able to protect their core metabolic functions, but they’re also able to “invent” new metabolic processes. Millions of microbes have still yet to be discovered, and knowledge of their metabolic activity would be a priceless addition to humanity. These seemingly invisible life forms already have the tools to keep the Earth in balance, but tools don’t use themselves. They are in need of a user with a direction and a purpose. Needless to say, it is now up to us to discover these tools and apply them exponentially to return our world to its healthy state of being.

Module 01 references

Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583.

Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science 320(5879): 1034-1039.

Kasting JF, Siefert JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science 296: 1066-1068.

Waters CN, Zalasiewicz J, Summerhayes C, Barnosky AD, Poirier C, Galuszka A, Cearreta A, Edgeworth M, Ellis EC, Ellis M, Jeandel C, Leinfelder R, McNeill JR, Richter DD, Steffan W, Syvitski J, Vidas D, Wagreich M, Williams M, Zhisheng A, Grinevald J, Odada E, Oreskes N, Wolfe AP. 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science 351(6269): aad2622.

Module 02

Module 02 portfolio check

  • Evidence worksheet_04
    • Completion status:
    • Comments:
  • Problem Set_03
    • Completion status:
    • Comments:
  • Writing assessment_02
    • CANCELLED
  • Additional Readings
    • Completion status:
    • Comments

Remapping the body of the world

Evidence worksheet 04

Martinez et al 2007

Learning obectives

  • Discuss the relationship between microbial community structure and metabolic diversity
  • Evaluate common methods for studying the diversity of microbial communities
  • Recognize basic design elements in metagenomic workflows

General questions

  • What were the main questions being asked?

What is the physiological basis (genetics and biochemistry, structure and function) of a proteorhodopsin photosystem?

What is the minimal gene cluster that can be transferred between microbes to allow for the ubiquity of PR photosytems among diverse microbial taxa?

  • What were the primary methodological approaches used?

The HOT_10m fosmid library was used for screening. High-density colony macroarrays (12,280 clones of the HOT_10m library) were prepared on a Performa II filter by using Q-PixII robot. Filters were used to help with the visual detection of color against the white background. Colonies were screened visually for an orange or red phenotypic color. Fosmid DNA from positive clones were retransformed into fresh E. coli EPI300 and rescreened, and then sequenced using primers T7 and EpiFos5R. To obtain the full DNA sequence of the putative PR photosystem fosmid clones, the clones to be characterized underwent random in vitro transposition by using the EZ-Tn5insertion kit. This sequencing approach allowed rapid DNA seqeuncing while simultaneoulsy providing a set of precisely located insertion mutants for phenotypic analysis of specific gene functions. For carotenoid extraction, overnight culture of clones were prepared, and cells were harvested in darkness or low light to prevent carotenoid photooxidation. Carotenoids were identified via HPLC analysis. Cultures of clones to be analyzed for proton-pumping activity were prepared, and ATP was measured using a luciferase-based assay. An ATP standard curve was generated and used to calculate the concentration of ATP in samples.

  • Summarize the main results or findings.

After screening the fosmid library, 3 colonies were identified as potential PR-expression clones based on the fact that all three showed no pigmentation in the absence of the high-copy number inducer AND all three showed an orange phenotype in the absence of L-retinal when induced to high copy number. 2 of the 3 clones were sequenced. Both clones appear to be derived from other PR-containing BAC clones from Alphaproteobacteria from the Mediterranean and Red Seas. Both PR genes analyzed encoded proteins with a glutamine residue at position 105, which is characteristic of blue light-absorbing PRs. Adjacent to the PR gene in both clones was a predicted six-gene operon encoding enzymes involved in ß-carotene and retinal biosynthesis. These genes include crtE, crtI, crtB, crtY, blh, and idi. Transposon insertion mutants disrupted in the PR gene showed no orange pigmentation, and HPLC analysis showed low levels of retinal in these extracts. Transposon insertion mutants in crtE, crtB, and crtI showed no pigmentation, as expected, since lypocene (the first colored product in the biosynthetic pathway) would not have been formed. crtY insertion mutants were pink, however, and subsequent pigment analysis confirmed that they were accumulating lycopene, although they did not synthesize retinal or ß-carotene. Light-dependent decreases in pH were observed in PR+ clones only. No light-dependent proton-translocating activity was observed in mutants unable to synthesize retinal (CrtY- or Blh-). Idi- mutants had normal proton-pumping activity. CCCP abolished light-driven increase in pH and subsequently photophosphorlyation. DCCD did not affect external pH changes, but it abolished photophosphorylation.

  • Do new questions arise from the results?

The authors argue that any microbe capable of synthesizeing FPP could readily acquire the PR photosystem, but does this actually occur in the natural environment from a single lateral transfer? If so, how often and how readily does it occur?

The approach taken by the authors, where PR photosystem recombinates were able to be detected visually when the fosmid vector was induced to high copy number, was not completely effective in detecting all targeted genotypes. Is there another more effective method to detect ALL PR-containing clones known to exist in the fosmid library?

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

Methods were very logical, although the approach that the authors took did not account for distant-genes, enzymes and intermediates from other pathways that may affect the functionality of the photosystem. Personally, I might have found it helpful if they elaborated on the details of their methods a little more, for my own understanding. Data results and figures were clearly explained. Conclusions and discussions were well supported by their own results as well as the results of previous studies.

Problem set_03

Wooley et al 2009 Madsen 2005

Learning objectives:

Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.

Specific Questions:

  • How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?

By 2016, 89 bacterial phyla and 20 archaeal phyla have been described via small 16S rRNA databases. Estimates of up to 1500 bacterial phyla can exist. Half (26 of 52) have cultured representatives by 2003. Half (30 of 60) are not cultured by 2013. Rinke et al 2013 The take-home message here is that most life is uncultured.

  • How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?

A lot, thousands, MANY metagenome sequencing projects are currently available in the public domain (although the number is always changing, there are about 110,217 on the EBI database). These projects are sourced from ALL environments (sediments, soil, aquatic environments, etc.), especially from those where it is hard to culture communities in the setting.

The Joint Genome Institute currently has about 51,500 metagenome sequencing projects available sourced from the following environments: Activated Sludge, Aerobic, Agricultural field, Anaerobic, Anaerobic digestor, Aquaculture, Ascidians, Asteroidea, Bacteria, Beetle, Biochar, Bone, Breviatea, Brown Algae, Bryozoans, Canal, Circulatory system, City, Cnidaria, Composting, Continuous culture, Ctenophora, Currency notes, Dairy products, Deep subsurface, Defined, Defined media, Digestive system, Dinoflagellates, Endosphere, Engineered product, Excretory system, Eye, Fermented beverages, Fermented seafood, Fermented vegetables, Freshwater, Fungi, Gastrointestinal tract, Genetic cross, Geologic, Green algae, House, Hydrocarbon, Indoor Air, Industrial wastewater, Integument, Intracellular endosymbionts, Lab synthesis, Landfill, Leaf, Lichen, Lymphatic, Lymphatic system, Mammals, Marine, Meat products, Microbial enhanced oil recovery, Microbial solubilization of coal, Mixed alcohol bioreactor, Mollusca, Mycelium, Nematoda, Nervous system, Nodule, Non-marine Saline and Alkaline, Nutrient removal, Oil refinery, Oil reservoir, Oomycetes, Outdoor Air, Peat, Peat moss, Persistent organic pollutants (POP), Phylloplane, Phyllosphere, Plant litter, Platyhelminthes, Red algae, Reproductive system, Respiratory system, Rhizome, Rhizoplane, Rhizosphere, Rock-dwelling (subaerial biofilms), Roots, Sediment, Shell, Silage fermentation, Simulated communities (DNA mixture), Simulated communities (microbial mixture), Simulated communities (sequence read mixture), Skin, Soil, Solar panel, Solid animal waste, Spacecraft Assembly Cleanrooms, Sponge, Symbiotic fungal gardens and galleries, Tailings pond, Tetrachloroethylene and derivatives, Thermal springs, Thermal springs, Thiocyanate, Tissue, Tunicates, Unclassified, Undefined media, Volcanic, Water treatment plant, Whole body, Wood.

  • What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications)?

Shotgun metagenomics, IMG/M, MG-RAST, NCBI/EBI * Assembly fitting annotation pipelines - EULER * Binning - S-GCOM * Annotation - KEGG * Analysis pipeline - MEGAN6 Market gene metagenomics * OTU clustering - UCLUST, CD-HIT * Analysis pipeline - SILVA = gold standard for 16S right now * Denoising - AmpliconNoise, PyroNoise * Chimera detection - UCHIME, ChimeraSlayer, Perseus, Decipher * Database - Ribosomal Database Project Microgene metagenomics - TUBE BASE? Review articles to compare published sequences

  • What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?

Phylogenetic anchors are characterstic genes of a taxonomic unit due to the accumulation of mutations through evolution and vertical gene transfer. Phylogenetic: vertical gene transfer, carry phylogenetic info allowing tree reconstruction, taxonomic, ideally single-copy.

Functional anchors are genes that encode proteins characteristic of metabolic pathways. Functional: more horizontal gene transfer, identify specific biogeochemical functions associated w/ measurable effects, not as useful for pylogeny.

  • What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?

Metagenomic sequence binning is the process of grouping sequences or sequence reads that come from, in theory, a single genome. One can decide on how to group those sequences based on different qualities. For example, if one has a database of pre-existing genomes of similar organisms, one can map his/her sequences onto the genomes and group them according to which genome they map to or group them according to different qualities (Ex. GC contents, codon usage, etc.)

Types of algorithms: * Align sequences to database * Group to each other based on DNA characteristics (GC content, codon usage)

Issues for filling bins: * Incomplete coverage of genome, may not have a representative genome (may be missing key parts) * If genome from another organism had similar enough properties, may contaminate that bin (putting non-belonging genes into genome) - contamination from different phylogeny (5-10% contamination is acceptable) * Genetic variations within a species may cause sequences from the same species to be binned separately. This is not ideal, since each bin should ideally account for species-wide variation.

Uncultivated organisms: Opportunities: Binning using sequences from uncultivated organsisms allows the prediction of potential metabolic functions used by the uncultivated organisms Risks: It is possible for sequences from uncultivated organisms to be accidentally put in the wrong bin, thus leading to contamination

  • Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?

An alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms is 3rd generation sequencing (Oxford Nanopore). Since 3rd generation sequencing works with long nucleotide sequences, a single molecule, it won’t require shearing and amplification as 2nd generation technologies do. Single cell sequencing may also be used as an alternative, in which cells are separated and sequenced individually. Both 3rd generation sequencing and single cell sequencing significantly reduces risks for binning errors and contaminations. However, as 3rd generation sequencing is still under development, it may have a higher error rate and a lower degree of genome completeness. Other alternatives include functional screening tests via biochemical assays, in which we culture microbes in growth media that specifically supports the desired metabolic function, and imaging techniques such as FISH, with which the presence of conserved DNA or RNA can be observed.

Module 02 references

Madsen EL. 2005. Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nat Rev Microbiol. 3(5):439-46.

Martinez A, Bradley AS, Waldbauer JR, Summons RE, Delong EF. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proc Natl Acad Sci USA. 104(13):5590-5.

Wooley JC, Godzik A, Friedberg I. 2010. A primer on metagenomics. PLoS Comput Biol. 6(2):e1000667.

Module 3

Module 03 portfolio check

  • Evidence worksheet_05
    • Completion status:
    • Comments:
  • Problem set_04
    • Completion status:
    • Comments:
  • Writing Assessment_03
    • Completion status:
    • Comments:
  • Additional Readings
    • Completion status:
    • Comments

Microbial species concepts

Evidence worksheet 05

Welch et al 2002

Part 1: Learning obejectives

  • Evaluate the concept of microbial species based on environmental surveys and cultivation studies
  • Explain the relationship between microdiversity, genomic diversity and metabolic potential
  • Comment on the forces mediating divergence and cohesion in natural microbial communities

Part 1: General questions

  • What were the main questions being asked?

What is the genome sequence of Escherichea coli CFT073 and how does it compare with the genome sequences of the enterohemorrhagic E. coli strain EDL933 and the nonpathogenic laboratory strain MG1655? How do genetics confer pathogenicity and evolutionary diversity to different strains of E. coli?

  • What were the primary methodological approaches used?

Whole-genome libraries were prepared from E. coli CFT073 genomic DNA. Random clones were sequenced by dye-terminator chemistry and data was collected on Applied Biosystems ABI377 and 3700 automated sequencers. Sequence data were assembled by SEQMANII, annoted by MAGPIE, had ORFs defined by GLIMMER, and sequence-matched on BLAST. Gene orthology was carried out between CFT073 gene and a gene from another strain of E. coli; if the similarity of both genes was greater than 90%, the gene would not be matched elsewhere. DNA sequences were assembled from a shotgun library of CFT073 DNA fragments, using PCR-based techniques and primer walking experiments for finishing. A whole-genome XhoI restriction fragment optical map allowed the confirmation of contig structure during circular assembly of the genome.

Codon usage analysis was used to find out whether different patterns of usage occur between backbone and island genes, since distinctive codon usages are a hallmark of lateral gene transfer.

  • Summarize the main results or findings.

CFT073 genome sequencing resulted in a circular 5,231,428-bp chromosomal sequence with seven times coverage. No virulence plasmids were found in CFT073. 5 cryptic prophage genomes are found in the CFT073 chromosome, but none have enough genetic information to produce viable phage. The CTF073 genome is 590,209 bp longer than MG1655 and similar in size to EDL933. Comparisons showed that over 70% of the ORFs previously thought to be unique to either MG1655 or EDL933 are replaced with new genes unique to CFT073. Only 39.2% of their combined set of proteins are common to all three strains.

Codon usage pattern in EDL933 backbone ORFs was indistinguishable from CFT073 backbone ORFs. CFT073-specific islands contain 2,004 genes of which 204 are also found in EDL933-specific genes. Although the E. coli backbone is evolutionarily conserved through VGT, there are many differences in the pathogenicity islands that may have been acquired by HGT. There are many differences between the large pathogenicity islands of MG1655 and CFT073, for example.

CFT073 lack genes for type III secretion system and phage/plasmid encoded virulence genes that are common to other E. coli isolates; however, CFT073 genome contains genes that contribute to its ability to colonize in different niches of the urinary tract tissues and its ability to cause disease, including genes encoding for 10 fimbriae of the chaperone-usher family, 2 type IV pili, 2 pap operons, and the foc operon. The CFT073 genome is especially rich in genes that encode fimbral adhesins, autotransporters, iron-sequestration systems, and phase-switch recombinases, which also contribute to pathogenesis in the urinary tract. In addition, Type II general secretory pathway secretion for chitinase is found in CFT073, but is absent from the EDL933 genome.

  • Do new questions arise from the results?

How do we define “species”" aside from using phenotypic and genotypic traits, taking into account the environmentally-based frequent gain and loss of genomic sequences?

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

It seems like some of the important information is only presented in the graphs, but not elaborately stated in the paper. This appears to be a problem, because as an undergraduate student studying microbiology, I don’t have special expertise in reading graphs related to genomics. Figure 3, for example, was particularly confusing to me since it took me a while to orient myself on the graph; in addition, I wasn’t sure if the length of the vertical lines meant anything. It might have been beneficial if the authors included more figures with more detailed descriptions.

Part 2: Learning objectives

  • Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
  • Identify common molecular signatures used to infer genomic identity and cohesion
  • Differentiate between mobile elements and different modes of gene transfer

Part 2: Specific questions

  • Based on your reading and discussion notes, explain the meaning and content of the following figure derived from the comparative genomic analysis of three E. coli genomes by Welch et al. Remember that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Explain how this study relates to your understanding of ecotype diversity. Provide a definition of ecotype in the context of the human body. Explain why certain subsets of genes in CFT073 provide adaptive traits under your ecological model and speculate on their mode of vertical descent or gene transfer.
Figure 3

Figure 3

The figure above shows the locations and sizes of CFT073 and EDL933 islands. The vertical axis shows island sizes (all islands >4kb are shown), and the horizontal axis shows the position of the islands in the colinear backbone. 60 islands >4kb are shown in the CFT073 backbone, and 57 islands >4kb are shown in the EDL933 backbone. As we can see in the figure above, many islands share the same locations on the backbones of CFT073 and EDL933, but this does NOT necessarily mean the contents of these islands are similar as well.

An ecotype is a distinct species occupying a specific environment/niche/habitat. In the context of the human body, an ecotype may be defined based on the niche the microbe fits into in the microbiota (in terms of the environment/habitat it lives in, and the metabolic roles it plays). Since there are many different unique microenvironments found in and on the human body, there is a high diversity of ecotypes that are adapted to live in/on the human body. Drawing from specific information presented in this study, besides the core essential backbone genes transmitted vertically, which are usually shared among ecotypes, accessory genes/genomic islands transferred horizontally, which are different and unique to each species or strain, promote the ability of the organism to survive and adapt to specific microenvironments, as well as develop its pathogenicity. For example, special adhesins found in CFT073 prevent the organism from being flushed away in the urinary tract; these genes are found on genomic islands obtained through HGT and are specific to the CFT073 strain.

Problem set 04

Learning objectives:

  • Gain experience estimating diversity within a hypothetical microbial community

Outline:

In class Day 1:

  1. Define and describe species within your group’s “microbial” community.
  2. Count and record individuals within your defined species groups.
  3. Remix all species together to reform the original community.
  4. Each person in your group takes a random sample of the community (i.e. devide up the candy).

Assignment:

  1. Individually, complete a collection curve for your sample.
  2. Calculate alpha-diversity based on your original total community and your individual sample.

In class Day 2:

  1. Compare diversity between groups.

Part 1: Description and enumeration

Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.

Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.

Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.

For example, load in the packages you will use.

install.packages("kableExtra", repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/7c/dzs6m9916vv53_38z2zwxr0h0000gn/T//RtmpEPWzls/downloaded_packages
#To make tables
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.4.4
library(knitr)
#To manipulate and plot data
library(tidyverse)

Then load in the data. You should use a similar format to record your community data.

community_data = data.frame(
  number = c(1,2,3,4,5,6,7,8,9,10,11,12,13),
  name = c("Strings", "Gummy bear", "Sugar gummy bear", "Wine gummy", "Sugar Swirl", "Sugar bottle", "Sugar Octopus", "Mike-Ike", "Sphere", "Skittles", "Hershey Kiss", "M&M", "Lego"),
  characteristics = c("Red string", "Gummy bear", "Sugar-coated bear", "Wine gummy", "White swirl and sugar-coated", "sugar-coated bottle", "7-legged octopus and sugar-coated", "Ovoid chewy", "Spherical chewy", "Small fruit sugar chewy", "pyrimydal shape chocolate", "small chocolate coated with color", "lego-like hard sugar candy"),
  occurences = c(7, 16, 1, 2, 1, 1, 4, 25, 6, 26, 1, 40, 1))

Finally, use these data to create a table.

 community_data %>% 
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
number name characteristics occurences
1 Strings Red string 7
2 Gummy bear Gummy bear 16
3 Sugar gummy bear Sugar-coated bear 1
4 Wine gummy Wine gummy 2
5 Sugar Swirl White swirl and sugar-coated 1
6 Sugar bottle sugar-coated bottle 1
7 Sugar Octopus 7-legged octopus and sugar-coated 4
8 Mike-Ike Ovoid chewy 25
9 Sphere Spherical chewy 6
10 Skittles Small fruit sugar chewy 26
11 Hershey Kiss pyrimydal shape chocolate 1
12 M&M small chocolate coated with color 40
13 Lego lego-like hard sugar candy 1

For your community:

  • Construct a table listing each species, its distinguishing characteristics, the name you have given it, and the number of occurrences of the species in the collection.
  • Ask yourself if your collection of microbial cells from seawater represents the actual diversity of microorganisms inhabiting waters along the Line-P transect. Were the majority of different species sampled or were many missed?

Part 2: Collector’s curve

To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.

To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.

For example, we load in these data.

names <- community_data$name
newdata <- as.data.frame(t(community_data))
newtable1 <- newdata[c(2,4),c(1:13)]
colnames(newtable1) <- names
newtable1 <- newtable1[2,]

newtable2 <- data.frame(
  Strings=7,
  Gummy_bear=16,
  Sugar_Gummy=1,
  Wine_Gummy=2,
  Sugar_Swirl=1,
  Sugar_Bottles=1,
  Sugar_Octopus=4,
  Mike_Ike=25,
  Sphere=6,
  Skittles=26,
  Hershey_Kiss=1,
  M_M=40,
  Lego=1
)

And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.

install.packages("vegan", repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/7c/dzs6m9916vv53_38z2zwxr0h0000gn/T//RtmpEPWzls/downloaded_packages
library(phyloseq)
library(vegan)
## Warning: package 'vegan' was built under R version 3.4.4
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-1
rarecurve(newtable2, step = 13, xlab = "Cumulative Number of Species Observed", ylab = "Cumulative Number of Individuals Classified", label = TRUE)

For your sample:

  • Create a collector’s curve for your sample (not the entire original community).
  • Does the curve flatten out? If so, after how many individual cells have been collected? No, the curve doesn’t flatten out
  • What can you conclude from the shape of your collector’s curve as to your depth of sampling? Since the slope of the curve doesn not reach zero, sampling may have been inadequate.

Part 5: Concluding activity

If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.

  • How does the measure of diversity depend on the definition of species in your samples?
  • Can you think of alternative ways to cluster or bin your data that might change the observed number of species?
  • How might different sequencing technologies influence observed diversity in a sample?

Writing assessment 03

Discuss the challenges involved in defining a microbial species and how HGT complicates matters, especially in the context of the evolution and phylogenetic distribution of microbial metabolic pathways. Can you comment on how HGT influences the maintenance of global biogeochemical cycles through time? Finally, do you think it is necessary to have a clear definition of a microbial species? Why or why not?

Introduction

The definition of “microbial species” has always been a controversial topic of debate among microbiologists and will likely never reach a satisfactory consensus. Since the 70% DNA-DNA hybridization (DDH) criterion established by Wayne et al in 1987, the scientific community has gradually found that species concept lacking, mainly due the presence of intra-species genomic variation that resulted in great phenotypic diversity. To complicate matters further, horizontal gene transfer (HGT) - a mechanism used among different bacterial species to transfer and share genetic material - obscures individual prokaryotic evolutionary lineages in the construction and usage of phylogenetic trees. In addition to being a source of intra-species diversity, HGT plays an essential role in the maintenance of global biogeochemical cycles throughout time by ensuring the survival of core metabolic pathways. This essay will address the challenges and importance in defining a microbial species with a special emphasis on the complications of HGT and its contributions to the maintenance of global biogeochemical cycles.

Challenges in Defining a Microbial Species

Although the establishment of the 70% DDH criterion allowed taxonomists take a polyphasic approach towards the classification of microbial species, in which both phenotypic and genotypic characteristics are taken into account, there still remains limitations with this method of identification. This approach requires a 70% DDH and the sharing of at least one diagnostic phenotypic trait in order for a collection of strains to be considered a species (Wayne et al., 1987). 70% DDH corresponds to 97% sequence similarity of the 16S rRNA gene, the standard gene used for phylogenetic classification of microbes (Stackebrandt & Goebel, 1994). Since this approach allows for 30% genetic variability between strains of the same species, great intra-species variation and phenotypic diversity is observed within a microbial species. As a result, this low-resolution method of classification renders the 16S rRNA gene unreliable for the classification of anything above the genus level. The unreliability of the 16S rRNA gene for phylogenetic classification is further complicated with HGT. In a study conducted by Coleman et al, it was found that phenotypically distinct Prochlorococcus strains that differ by less than 1% in 16S rRNA sequences show genetic variability mostly on genomic islands, which appear to have been acquired by HGT and expressed differentially under specific environmental conditions (2006). This suggests that HGT does not render a recipient bacteria more similar to the donor bacteria; rather, HGT offers the recipient bacteria the opportunity to adapt and thrive in a new niche, similar to the gaining of antibiotic resistance. Environmental selection of these newly transferred accessory genes to the genomic backbone may contribute to intra-species diversity and may support a continuum of genetic diversity, rather than clusters of genetic diversity (Grey & Williams, 1971). Furthermore, not only does this allow two genetically similar bacterium (due to HGT) to thrive in different niches and display different phenotypes, but it also allows the opposite, where two different bacterium of very different genetic backgrounds may display a similar phenotype. Clearly, these complications pose ambiguity in deciding exactly where to draw the boundary between different microbial species.

New methods of sequence analysis and tests for classification have been elaborated; however, there appears to be fallbacks in these methods when HGT is considered. Since the majority of bacteria are uncultured, environmental shotgun sequencing (ESS) used in conjunction with next generation sequencing offers a culture-independent method that extracts, sequences, and matches DNA and RNA fragments of non-cultured prokaryotes to those of cultured prokaryotes. However, given the error-prone nature of next generation sequencing, which generates noise, and the complication of HGT, ESS is often unable to distinguish between genomes of great similarity, and whether those similarities arose due to HGT (Eisen, 2007). On the other hand, biochemical tests, when obtained under standardized conditions, only show certain phenotypic metabolic traits that allow the bacteria to thrive under those conditions; in other words, the true extent of the bacterium’s genetic diversity may not be observed. HGT also complicates this since similar metabolic potentials may be transferred to a bacterium of a different species through DNA (Fraser et al., 2009).

HGT and the Maintenance of Global Biogeochemical Cycles

Despite the many complications HGT brings to the microbial species definition, it plays an essential role in the preservation of core metabolic processes that contribute to the maintenance of today’s global biogeochemical cycles. Thanks to the mechanism of HGT, core metabolic genes are now wide-spread throughout the innumerable diverse groups of microbes. Microbes act as “guardians of metabolism” by ensuring that mass extinction events such as global glaciation and bombardment events, or extreme environmental conditions, won’t eradicate these essential core metabolic pathways, since wiping out certain groups of microbes will not necessarily eliminate the core metabolic processes due to their wide-spread nature among diverse groups of microbes that inhabit different niches. The preservation and spread of nitrogenase, the enzyme responsible for nitrogen fixation, among microbes is an important example and evolutionary event, as it produced an atmosphere that allowed the growth of organic matter such as plants. Originating from an Archaean source, nitrogenase was horizontally transferred to cyanobacteria and selected for by the lack of fixed nitrogen in the environment at the time. It is also important to recognize that HGT does not only transfer individual genes, but also entire metabolic pathways at times (Falkowski et al., 2008). It is through HGT that core metabolic genes could be so widespread, ubiquitous and protected, thereby allowing the Earth’s biogeochemical cycles to continue in peace.

The Importance in Defining a Microbial Species

Looking at the microbial species definition in isolation may seem relatively meaningless, as if it was nothing more than semantics enforced by fastidious scientists. However, when the microbial species definition is used in a more meaningful context, such as for the diagnosis of infectious disease agents, or for the interpretation of international regulations for transport and possession of pathogens, or for the reporting of bioterrorism agents, or for determination of quarantine, it’s obvious that an ambiguous simple definition may potentially cause some serious confusion and damage to people. All this fuss over what defines a microbial species is for the sake of one very important core component in any social arrangement - that is, communication. Having a detailed and unambiguous definition may not only help scientists relay their findings better, but also help health professionals give clear and precise explanations to their patients, thus reducing their anxiety and confusion.

Conclusion

The microbial species definition will unlikely come to a consensus in the near future; however, despite all the controversy that surrounds the topic, it is clear that in order to generate a more sophisticated definition that better incorporates the diversity of microbes, more and higher resolution sampling and genomic sequences must be obtained. In other words, in comparison to the world of microbial life that exists among us, we simply know too little to generate an unambiguous classification system. Perhaps there are even other mechanisms of genetic transfer besides HGT that we have not yet discovered. It would seem inefficient and illogical for us to discover every mystery of the microbes before we’re able to come up with a microbial species definition. Perhaps what is more important is not a definition based on morphology, metabolic diversity or genetic composition. Perhaps all we need to communicate is to be able to relay how microbes are in relation to their interactions with us, be it beneficial or harmful; yet, even that would require more information than just simply genome sequences.

Module 03 references

Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou SR, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA. 99(26):17020-4.

Wayne LG, Brenner DJ, Colwell RR, Grimond PAD, Kandler O, Krichevsky MI, Moore LH, Moore WEC, Murray RGE, Stackebrandt E, Starr MP, Truper HG. 1987. Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics. Int J Syst Bacteriol. 37(4): 463-464.

Stackebrandt E, Goebel BM. 1994. Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Evol Micr. 44(4):846-9. Coleman ML, Sullivan MB, Martiny AC, Steglich C, Barry K, Delong EF, Chisholm SW. 2006. Genomic islands and the ecology and evolution of Prochlorococcus. Science. 311:1768-1770. Grey T, Williams S. 1971. Microbial productivity in soil. Symp. Soc. Gen. Microbiol. 21:255-286. Eisen JA. 2007. Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes. PLoS Biology 5(3): 384-388.

Fraser, C, Alm, EJ, Polz, MF, Spratt, BG, Hanage, WP. 2009. The bacterial species challenge: making sense of genetic and ecological diversity. Science. 323:741-746.

Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science 320(5879): 1034-1039.

Project 1

  • CATME account setup and survey
    • Completion status: 0
    • Comments: Did not complete survey
  • CATME interim group assessment
    • Completion status: X
    • Comments:
  • Project 1
    • Report (80%):
    • Participation (20%):

Abstract

Writing

  • I wrote and synthesized the abstract

Editing

Introduction

Literature research

Writing

Editing

Methods

Writing

Editing

Results

Anaysis

  • I did some analysis of the results (interpreted graphs and statistical significance)

Figures

  • I helped with the initial attempts of making the figures; troubleshooting, etc.

Writing

  • I interpreted graphs and wrote out the results under each specific question asked

Editing

Discussion

Literature research

Writing

Editing

  • I edited parts of the discussion for flow and accuracy

Project_1

Abstract

To study the diversity and biochemical responses of microbial communities in the context of oxygen minimum zones (OMZs), Saanich Inlet was used as a model ecosystem from which water samples were collected at seven major depths spanning the oxycline. A metagenomic study was conducted in which genomic DNA was extracted from the water samples, PCR amplified, assembled into contiguous sequences, and processed into operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) using mothur and QIIME2 pipelines. Based on OTU and ASV results, we chose to focus in on Cyanobacteria as our taxon of interest. Both mothur and QIIME2 data produced alpha diversity values based on Shannon’s diversity index that suggest a decrease of Cyanobacterial abundance as depth increases. In addition, Cyanobacterial abundance significantly differs across depth and oxygen concentration according to both mothur and QIIME2 data; specifically, abundance significantly decreases at deeper depths and in environments with lower concentrations of oxygen. Within the Cyanobacteria phylum, there are 15 OTUs and 51 ASVs across all samples. Abundance of five OTUs from the mothur pipeline showed significant changes across both depth and oxygen. In contrast, abundance of none of the ASVs from the QIIME2 pipeline showed significant changes across depth, although 17 ASVs showed significant changes across oxygen concentrations. An increase of Cyanobacterial abundance at shallow depths may be explained in part by their ability to absorb red and orange light at the upper boundaries of the water column. This is also supported by the significant change in oxygen and chlorophyll A concentrations across the depth profile. Another explanation for low Cyanobacterial abundance at lower depths may be due to changes in temperature; where temperature drops below 10oC at 100m, Cyanobacterial growth stops completely.To study the diversity and biochemical responses of microbial communities in the context of oxygen minimum zones (OMZs), Saanich Inlet was used as a model ecosystem from which water samples were collected at seven major depths spanning the oxycline. A metagenomic study was conducted in which genomic DNA was extracted from the water samples, PCR amplified, assembled into contiguous sequences, and processed into operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) using mothur and QIIME2 pipelines. Based on OTU and ASV results, we chose to focus in on Cyanobacteria as our taxon of interest. Both mothur and QIIME2 data produced alpha diversity values based on Shannon’s diversity index that suggest a decrease of Cyanobacterial abundance as depth increases. In addition, Cyanobacterial abundance significantly differs across depth and oxygen concentration according to both mothur and QIIME2 data; specifically, abundance significantly decreases at deeper depths and in environments with lower concentrations of oxygen. Within the Cyanobacteria phylum, there are 15 OTUs and 51 ASVs across all samples. Abundance of five OTUs from the mothur pipeline showed significant changes across both depth and oxygen. In contrast, abundance of none of the ASVs from the QIIME2 pipeline showed significant changes across depth, although 17 ASVs showed significant changes across oxygen concentrations. An increase of Cyanobacterial abundance at shallow depths may be explained in part by their ability to absorb red and orange light at the upper boundaries of the water column. This is also supported by the significant change in oxygen and chlorophyll A concentrations across the depth profile. Another explanation for low Cyanobacterial abundance at lower depths may be due to changes in temperature; where temperature drops below 10oC at 100m, Cyanobacterial growth stops completely.

Introduction

Oxygen minimum zones (OMZs) are areas in the ocean where dissolved oxygen concentrations fall below 20 \(\mu\)M (1). Due to temperature increases and other effects caused by global warming, OMZs are expanding at a notable rate. Saanich Inlet, a seasonally anoxic fjord off the coast of British Columbia, is a model ecosystem for studying the diversity and biochemical responses of microbial communities to the hypoxic environments commonly observed in OMZs (1, 2). In particular, Saanich Inlet has been used to model the metabolic coupling and symbiotic interactions that occur in OMZs (3). The inlet undergoes recurring cycles of water column stratification and deep water renewal, rendering it a model ecosystem for studying microbial responses to changes in ocean deoxygenation levels (4). Increased levels of primary productivity in ocean surfaces during the spring season, as well as the limited mixing which occurs between the basin and surface waters both result in the development of an anoxic body of water with increasing depth in the Inlet (2). These anoxic regions become populated with chemolithoautotrophs, and eventually lead to a decrease in aerobically respiring organisms found deeper within these zones. Past studies have demonstrated that these kinds of metabolic shifts generally lead to a loss of nitrogen along with the production of greenhouse gases, most notably methane (CH4) and nitrous oxide (N2O) (1).

In order to investigate the changes that occur in these OMZs, water samples of various depths were collected from Saanich Inlet. Genomic DNA was extracted from these to conduct a metagenomics study, allowing to overcome the barrier of uncultivability of these samples and enable a more thorough exploration of the relationship which exists between the microbes and their communities based on genetic distribution of metabolic processes (5, 6, 7, 8). The extracted DNA is sequenced to generate raw data, which can then be assembled into contiguous sequences. These contigs generated by amplicon sequencing are then compared to a sequence database to determine the microbial taxa present in the environment at each water depth. This involves processing the sequencing data, and there currently exists two methods for this type of data analysis: operational taxonomic units (OTUs) and amplicon sequence variants (ASVs). OTU based pipelines work based on clustering reads which differ by less than a fixed dissimilarity threshold (9). This allows more data to be kept, although some may not be representative of the actual taxa in the community. On the contrary, ASV based pipelines resolve these sequence variants by inferring biological sequences in the sample prior to amplification and possible sequencing errors, and are able to distinguish variants which differ by even one nucleotide (9). This treats each ASV as individual species, though has the potential to discard more data and bias towards sequences that are less error-prone.

The objective of this paper is to analyze the data generated from both OTU and ASV pipelines in order to decide which produces more logical inferences, and ultimately determining which pipeline would be preferred to carry out future analysis of collected water samples. The taxonomy of interest which was selected for this comparison was the phylum Cyanobacteria. Cyanobacteria was selected as there are sufficient numbers of OTUs and ASVs to make sound comparisons, but not so much so that computation-wise it would be infeasible.

Methods

Sampling

Samples were obtained on Saanich Inlet Cruise 72 and taken from seven major depths spanning the oxycline: 10, 100, 120, 135, 150, 165 and 200 m. Waters were filtered, and genomic DNA was extracted. Further sampling details can be found in (3).

DNA Sequencing

Samples were PCR amplified using the 515F and 808R primers, then sequenced according to the standard operating protocol on an Illumina MiSeq platform with Phred33 quality scores.

Data Processing

Sequences were processed using either mothur or QIIME2 as follows:

Mothur Pipeline: mothur was used to clean-up the data. In brief, paired end reads were combined into contigs using their overlapping regions. Low quality sequences, useless sequence data, chimeric sequences and singletons were removed. OTUs were then determined at 97% similarity. OTUs were classified using the SILVA databases, and the taxonomies for each OTU were condensed. The OTU table, taxonomy data and sample metadata were subsequently cleaned up and combined into a phyloseq object.

QIIME2 Pipeline: Demultiplexed sequences were imported into QIIME as manifest reads. QIIME was used to clean up the data along with ASV determination in one step. Sequence quality was visually evaluated, and sequence quality trimming was conducted. All other trimming/filtering parameters were left as default. ASV determination was completed using the Dada2 protocol. ASV classification was completed using the Silva version 119 database at 99% similarity. The ASV table, taxonomy data and sample metadata were subsequently cleaned up and combined into a phyloseq object.

Data Analysis

The aforementioned phyloseq objects were imported into R version 3.4.3 (Windows) or 1.1.383 (Mac). The tidyverse, phyloseq, magrittr, knitr and cowplot packages were loaded and used to complete the data analysis. Data was piped into linear models and ANOVA tests to determine statistical significance at the 95% confidence level.

Environment setup and Data Cleaning
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
## 
## The downloaded binary packages are in
##  /var/folders/7c/dzs6m9916vv53_38z2zwxr0h0000gn/T//RtmpEPWzls/downloaded_packages
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
## 
##     ggsave
## You set `rngseed` to FALSE. Make sure you've set & recorded
##  the random seed of your session for reproducibility.
## See `?set.seed`
## ...
## 614OTUs were removed because they are no longer 
## present in any sample after random subsampling
## ...
## You set `rngseed` to FALSE. Make sure you've set & recorded
##  the random seed of your session for reproducibility.
## See `?set.seed`
## ...
## 6OTUs were removed because they are no longer 
## present in any sample after random subsampling
## ...

Results

  1. How does microbial community structure change with depth and oxygen concentration?

Alpha-diversity comparison:
Mothur
As shown in Figure 1, alpha-diversity by Shannon’s diversity index based on mothur shows an overall decreasing trend as depth increases. Specifically, a stable trend is observed from depths of 0-100m, starts decreasing at 100-150m and stabilizes at 150-200m. Figure 1 also shows an overall increasing trend as oxygen increases. Specifically, Shannon’s diversity index increases from 2.3-4.25 at oxygen concentrations of 0-40\(\mu\)M and decreases from 4.25-3.9 at oxygen concentrations of 40-220\(\mu\)M. It was also observed that Shannon’s diversity index is higher in oxic conditions (3.84 ? 0.45) than anoxic conditions (2.39 ? 0.07).

QIIME2
The trends of Shannon’s diversity index across depth and oxygen based on QIIME2 data are similar to those of mothur data. However, QIIME2 pipeline produces higher Shannon’s diversity index than mothur does. Shannon’s diversity index increases from 2.9-5.2 at oxygen concentrations of 0-40\(\mu\)M and decreases from 5.2-5.1 at oxygen concentrations of 40-200\(\mu\)M. Figure 2 shows that Shannon’s diversity index is higher in oxic conditions (4.80 ? 0.43) than anoxic conditions (3.15 ? 0.18).

Alpha-diversity of mothur data
## `geom_smooth()` using method = 'loess'

Table 1. Average and standard deviation of alpha-diversity by oxic/anoxic with mothur data
Statistic Oxic Anoxic
Average 3.8401008 2.3884700
Standard deviation 0.4523233 0.0666717
Alpha-diversity of QIIME2 data
## `geom_smooth()` using method = 'loess'

Table 2. Average and standard deviation of alpha-diversity by oxic/anoxic with QIIME2 data
Statistic Oxic Anoxic
Average 4.7959464 3.1546427
Standard deviation 0.4296851 0.1784873

Taxa presence and abundance:
Mothur
31 taxa in the phylum level are detected by mothur pipeline (Figure 3). These taxa, however, have abundance at different magnitudes. Among of them, Proteobacteria is the most predominant phylum in all samples with the highest average abundance over 75. On the contrary, phylum Peregrinlbacteria has abundance no larger than 0.001 in the seven samples (Figure 3). Additionally, different taxa have distinct changes in abundance across depth. For instance, both Thaumarchaeota and Verrucomicrobia reach their maximum abundance at depth of 100m and and gradually decrease in abundance until 200m, while Latescibacteria and Fibrobacteres are almost undetectable in shallow water and their abundances increase dramatically at depth of 200m.

QIIME2
QIIME2 pipeline detects 29 known taxa and unknown taxa in phylum level (Figure 4). Proteobacteria is still the most abundant phylum across samples. Taxa Chlorobi and Candidate division OP3 have the smallest abundance no larger than 0.004. Different pipelines may result in different changes in abundance for the common taxa shared by mothur and QIIME2 data. Although the change patterns of phylum Actinobacteria in abundance across depth are the same in both datasets, Chloroflexi abundance increases gradually with depth in QIIME2 data different from its change pattern in mothur data, in which abundance declines at depth from 100m to 135m and increases gradually until 200m.

  1. Does your taxon of interest significantly differ in abundance with depth and/or oxygen concentration?

Mothur
The difference in cyanobacteria abundance within depth or oxygen was estimated by the linear model using mothur processed data. The statistical results show that abundance of cyanobacteria is significantly different with depth (p = 0.01263) and oxygen (p = 0.00012). However linear models in Figure 5 indicate completely distinct trends of cyanobacteria abundance across depth and oxygen, where there is a decrease in abundance as depth increases and an increase in abundance as oxygen concentrations increases respectively.

QIIME2
For data processed by QIIME2 pipeline, ANOVA tests indicate cyanobacteria abundance in the seven samples has significantly difference across depth (p = 0.014) and oxygen (p = 0.013).The linear models in Figure 6 reveal that cyanobacteria abundance decreases at deeper water or in the environment with insufficient oxygen.

## 
## Call:
## lm(formula = Abundance ~ Depth_m, data = .)
## 
## Residuals:
##       1       2       6       4       5       3       7 
##  46.352 -50.700  20.041 -16.609  -3.284 -39.933  44.132 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 173.5309    39.3958   4.405  0.00699 **
## Depth_m      -1.0883     0.2864  -3.800  0.01263 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.31 on 5 degrees of freedom
## Multiple R-squared:  0.7428, Adjusted R-squared:  0.6914 
## F-statistic: 14.44 on 1 and 5 DF,  p-value: 0.01263
## 
## Call:
## lm(formula = Abundance ~ O2_uM, data = .)
## 
## Residuals:
##       1       2       6       4       5       3       7 
##   6.768 -17.048  19.375  -4.217  12.375 -22.627   5.375 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -5.3745     7.4690   -0.72 0.504007    
## O2_uM         0.9582     0.0885   10.83 0.000117 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.87 on 5 degrees of freedom
## Multiple R-squared:  0.9591, Adjusted R-squared:  0.9509 
## F-statistic: 117.2 on 1 and 5 DF,  p-value: 0.0001167

## 
## Call:
## lm(formula = Abundance ~ Depth_m, data = .)
## 
## Residuals:
##        1        4        5        3        2        6        7 
##   63.270  160.404   72.980  -30.172 -221.273  -44.444   -0.766 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 687.7807   122.7658   5.602   0.0025 **
## Depth_m      -3.3051     0.8925  -3.703   0.0140 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 131.8 on 5 degrees of freedom
## Multiple R-squared:  0.7328, Adjusted R-squared:  0.6794 
## F-statistic: 13.71 on 1 and 5 DF,  p-value: 0.01395
## 
## Call:
## lm(formula = Abundance ~ O2_uM, data = .)
## 
## Residuals:
##         1         4         5         3         2         6         7 
##    0.5171  190.2269  105.9213   18.5371 -121.0450  -61.0787 -133.0787 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 159.0787    57.3201   2.775   0.0391 *
## O2_uM         2.5772     0.6792   3.794   0.0127 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 129.5 on 5 degrees of freedom
## Multiple R-squared:  0.7422, Adjusted R-squared:  0.6907 
## F-statistic:  14.4 on 1 and 5 DF,  p-value: 0.0127

  1. Within your taxon, what is the richness (number of OTUs/ASVs)?

Mothur
Across all samples, there are 15 OTUs within cyanobacteria phylum. Table 3 presents the numbers of OTUs within cyanobacteria phylum for each sample. Most of samples contain 3-5 OTUs within cyanobacteria, except Saanich_120 and Saanich_200 which only have 1 and 0 cyanobacteria OTUs respectively.

QIIME2
There are 51 ASVs within cyanobacteria phylum across all samples. The number of ASVs within cyanobacteria phylum for each sample is shown in Table 2. Saanich_010, Saanich 120, and Saanich_135 have relatively high ASV number equal to or over 15. However, it is important to note that there are no singletons within the ASV dataset, and the function used to estimate richness, “estimate_richness” from the phyloseq library is highly dependent on the number of singletons, and warns of unreliable or wrong results in the absence of singletons in the data.

Table showing richness of OTUs using mothur data and ASVs using QIIME2 data
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 15 taxa and 7 samples ]
## sample_data() Sample Data:       [ 7 samples by 22 sample variables ]
## tax_table()   Taxonomy Table:    [ 15 taxa by 7 taxonomic ranks ]
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 51 taxa and 7 samples ]
## sample_data() Sample Data:       [ 7 samples by 22 sample variables ]
## tax_table()   Taxonomy Table:    [ 51 taxa by 7 taxonomic ranks ]
## Warning in estimate_richness(., measures = c("Observed")): The data you have provided does not have
## any singletons. This is highly suspicious. Results of richness
## estimates (for example) are probably unreliable, or wrong, if you have already
## trimmed low-abundance taxa from the data.
## 
## We recommended that you find the un-trimmed data and retry.
Table 3. OTUs/ASVs across depth
Depth_m OTU ASV
Saanich_010 10 5 17
Saanich_100 100 4 8
Saanich_120 120 1 15
Saanich_135 135 4 17
Saanich_150 150 3 11
Saanich_165 165 3 5
Saanich_200 200 0 2
  1. Do the abundances of OTUs/ASVs within your taxon of interest change significantly with depth and/or oxygen concentration?

Using the linear model for statistical interpretation, after correcting the p-value for multiple comparisons, the abundance of OTUs 0189, 0658, 1104, 3852, and 4312 from the mothur pipeline within the cyanobacteria phylum changed significantly with both depth and oxygen. Interestingly, the abundance of ASVs in the QIIME2 pipeline did not have any significant changes across the depth profiles after correcting for the p-value. However, there were 17 ASVs that had a significant abundance change with respect to oxygen concentrations.

Mothur
Table 4. Correlation data of Cyanobacteria OTUs with significant differences across depth using mothur data
Estimate Std. Error t value P_value Adjusted_P
Otu0189 -0.9610475 0.2669442 -3.600181 0.0155403 0.0491962
Otu0658 -0.0685106 0.0190455 -3.597207 0.0155891 0.0491962
Otu1104 -0.0583306 0.0164341 -3.549356 0.0163987 0.0491962
Otu3852 -0.0159083 0.0044820 -3.549356 0.0163987 0.0491962
Otu4312 -0.0053028 0.0014940 -3.549356 0.0163987 0.0491962

Table 5. Correlation data of Cyanobacteria OTUs with significant differences across oxygen using mothur data
Estimate Std. Error t value P_value Adjusted_P
Otu0189 0.8586534 0.0788686 10.88714 0.0001136 0.0004035
Otu0658 0.0611354 0.0058159 10.51177 0.0001345 0.0004035
Otu1104 0.0522766 0.0049096 10.64789 0.0001264 0.0004035
Otu3852 0.0142572 0.0013390 10.64789 0.0001264 0.0004035
Otu4312 0.0047524 0.0004463 10.64789 0.0001264 0.0004035

QIIME2
## [1] "None of ASV has significantly different abundance acrossing depth with QIIME2 data"

Table 6. Correlation data of Cyanobacteria ASVs with significant differences across oxygen using QIIME2 data
Estimate Std. Error t value P_value Adjusted_P
Asv12 0.0855435 0.0080338 10.647891 0.0001264 0.0004605
Asv144 0.1568297 0.0147287 10.647891 0.0001264 0.0004605
Asv294 0.4104610 0.0429693 9.552421 0.0002128 0.0006784
Asv404 0.1948490 0.0182993 10.647891 0.0001264 0.0004605
Asv663 0.9749372 0.0956990 10.187539 0.0001564 0.0005316
Asv790 0.0095048 0.0008926 10.647891 0.0001264 0.0004605
Asv945 0.0950483 0.0089265 10.647891 0.0001264 0.0004605
Asv1055 0.0380193 0.0035706 10.647891 0.0001264 0.0004605
Asv1085 0.1710870 0.0160677 10.647891 0.0001264 0.0004605
Asv1141 0.0665338 0.0062485 10.647891 0.0001264 0.0004605
Asv1209 0.0285145 0.0026779 10.647891 0.0001264 0.0004605
Asv1454 0.1283152 0.0120508 10.647891 0.0001264 0.0004605
Asv1578 0.0285145 0.0026779 10.647891 0.0001264 0.0004605
Asv1728 0.2281160 0.0214236 10.647891 0.0001264 0.0004605
Asv1817 0.2946498 0.0276721 10.647891 0.0001264 0.0004605
Asv2018 0.3390505 0.0720444 4.706133 0.0053079 0.0159238
Asv2336 0.1045531 0.0098191 10.647891 0.0001264 0.0004605

  1. Are the answers to the above the same using mothur and QIIME2 processed data?

In terms of differences between the two pipelines, the overall trend between both datasets were similar, however the statistical interpretations of the datasets varied. For starters, the Shannon diversity for the whole microbial community processed with mothur was generally smaller than that estimated by QIIME 2 processed data. Additionally, ANOVA tests indicated that the Shannon diversity had no significant change across depth from the mothur pipeline (p = 0.054). In contrast, there was statistical significance with the Shannon diversity across depth from the QIIME2 dataset (p = 0.022). Within the Cyanobacterial taxon, there were 51 ASVs calculated from the QIIME2 pipeline compared to 15 OTUs from the Mothur pipeline. Interestingly, while there is a difference in the richness between both datasets, 5 of the OTUs had a significantly different abundance with respect to depth and oxygen profiles, while none of the ASVs had a significant change in abundance across depth and 17 ASVs which had a significant difference in abundance across oxygen concentrations. Despite all these variations, the Cyanobacterial taxon itself had a significant difference in abundance with respect to depth and oxygen profiles along the water column across both pipelines.

ANOVA on alpha-diversity of mothur data across depth
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Depth_m      1  2.355  2.3554   6.265 0.0543 .
## Residuals    5  1.880  0.3759                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA on alpha-diversity of QIIME2 data across depth
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Depth_m      1  3.563   3.563   10.65 0.0224 *
## Residuals    5  1.673   0.335                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Discussion

The significant differences for Cyanobacteria abundance across depth and oxygen may be caused by photosynthesis and water temperature. Bacteria of the phylum Cyanobacteria obtain their energy through photosynthesis. These phototrophs are characterized by phycocyanin, a bluish pigment, which functions as an auxiliary light-harvesting protein complex to chlorophyll. Phycocyanin absorbs orange and red light at approximately 620nm and fluoresces at about 650nm depending on the species (10). This is particularly interesting because orange and red light are typically absorbed within the first 50m of a water column (11). From our data in Fig 5 & 6, it was found that there are significant differences in the abundance of Cyanobacteria across the Saanich Inlet depth profiles in both the mothur and QIIME2 pipelines. It is likely that this significance exists as a result of Cyanobacteria thriving at the upper boundaries of the water column, where they are still able to absorb red and orange light that is essential for Cyanobacterial photosynthesis. Furthermore, these findings are supported by the oxygen and fluorescence profiles across the water column (Fig 11). There is a significant difference in the concentration of oxygen and chlorophyll A across depth profiles. Higher concentrations of chlorophyll A and oxygen were found within the top 50m, which indicates increased photosynthetic activity. Moreover, there was a significant difference in the abundance of Cyanobacteria across oxygen concentrations and chlorophyll A concentrations for both pipelines; high oxygen and chlorophyll A concentrations were associated with a larger abundance. With Cyanobacterial photosynthesis contributing a substantial proportion of oxygen to Earth’s atmosphere, it is not surprising that a high abundance of Cyanobacteria is associated with a high concentration of oxygen and chlorophyll A, and a shallow depth. Moreover, it has been reported that the growth rate of cyanobacteria is significantly influenced by temperature. Cyanobacteria were observed to have a lower growth rate at colder temperatures. For marine cyanobacteria, the optimal growth temperature range is 20 - 27.5oC; at these temperatures, cyanobacteria grow at a rate of 0.8 d-1 (12). When the temperature dropped to approximately 15oC, the growth rate of cyanobacteria slowed to 0.22 d-1. Interestingly, when temperature dropped below 10oC, cyanobacterial growth rates came to a complete stop (13). Therefore, in our study, the decline of cyanobacteria abundance with depth may at least partly attributed to the decreasing temperature. According temperature data for each sample, the temperature is close to 13oC at 10m, and decreases to about 9oC when at depths below 100m. Hence, the growth rate of cyanobacteria at depths below 100m is substantially slower than the growth rate at the water’s surface. This leads to a lower abundance at lower depths.

Implications of potential differences in pipelines for microbial ecology make it difficult to make conclusive statements in research and discovery, since we become unable to differentiate actual biological differences seen and differences due to a particular pipeline being used. This could also suggest that one pipeline is more suited to the dataset. In fact, this difference could also be exploited, and manipulated so that a pipeline is selected based on the results that it gives, rather than the more appropriate pipeline for the given dataset.

In the context of this project, the main difference between the two pipelines is based on whether the pipeline produces OTUs (mothur) or ASVs (QIIME2). Both of them use different clustering algorithms to determine “true” sequences, and as a result, there are usually far more ASVs in the QIIME2 pipeline than OTUs created with the mothur pipeline. Therefore, when doing downstream analysis of the pipeline results, this may be one of the reasons why there is a large disparity in between the numbers of sequences and quality of the sequences produced even when using the same initial data.

However, when it came to counting abundances within our taxon, cyanobacteria, there was a far lower abundance seen in qiime2 results than mothur results which cannot be explained by having more numerous ASVs than OTUs. Interestingly, when running the estimate_richness function on our qiime2 data for determining abundance, this resulted in the warning that our data provided did not have any singletons (supposedly in the output), and that results are probably unreliable or wrong. This error did not occur with running this function on mothur data. Further analysis of why this error is seen with qiime2 data should be done before fully trusting the results of this function with qiime2 data.

In subsequent analyses with these two pipelines, perhaps a more in-depth analysis of each function available with the phyloseq package should be tested. A dataset of a well-studied and known community could be used so that the output of the functions can be compared with respect to the pipeline used. Comparisons can be made for the outputs of each pipeline, and results from the mothur and QIIME2 pipelines can be assessed with reference to the expected results. Use cases should also be considered, and standard use cases for either pipeline should be indicated so exploitation of pipelines for favourable results does not occur. Mothur may be more suited to a particular dataset, while QIIME2 could be more appropriate for another dataset that should be used with a “denoising” algorithm.

Future directions for this project could possibly involve the analysis of water samples from other OMZs at various depths, along with observation of the Cyanobacteria data present in those samples to see if the relationships are exhibited between oxygen, nitrogen, and the phyla population as with this study. These can also be analyzed once again with both mothur and QIIME2 for further comparison between the pipelines.

References

  1. Walsh DA, Zaikova E, Howes CG, Song YC, Wright JJ, Tringe SG, Hallam SJ. 2009. Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones. Science 326(5952): 578-582.
  2. Torres-Beltr?n M, Hawley AK, Capelle D, Zaikova E, Walsh DA., Mueller A, Finke J. 2017. A compendium of geochemical information from the Saanich Inlet water column. Nature scientific data 4(170159).
  3. Hawley AK, Torres-Beltr?n M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchek O, Gies EA, Fairley D, Malfatii SA, Norbeck AD, Brewer HM, Pasa-Tolic, L, del Rio TG, Suttle CA, Trige S, Hallam SJ. Data Descriptor: A compendium of multi-omic sequence information from the Saanich Inlet water column. Nature scientific data 4(170160).
  4. Hallam SJ, Torres-Beltr?n M, Hawley AK. Comment: Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Nature scientific data 4(170158).
  5. National Research Council. 2007. The new science of metagenomics: revealing the secrets of our microbial planet. National Academies Press, Washington, DC.
  6. Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Tringe SG. 2017. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology 35: 725-731.
  7. Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science 320(5879): 1034-1039.
  8. Wooley JC, Godzik A, Friedberg I. 2010. A primer on metagenomics. PLoS computational biology 6(2), e1000667.
  9. Callahan BJ, McMurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME journal 11(12): 2639-2643.
  10. Simis SG, Huot Y, Babin M, Seppala J, and Metsamaa L. 2012. Optimization of variable fluorescence measurements of phytoplankton communities with cyanobacteria. Photosynthesis research 112(1): 13-30.
  11. Light in the ocean (n.d.) [Online]. Link: https://manoa.hawaii.edu/exploringourfluidearth/physical/ocean-depths/light-ocean.
  12. Boyd PW, RynearsonTA, Armstrong EA, Fu F, Hayashi K, Hu Z, Hutchins DA, Kudela RM, Litchman E, Mulholland MR, Passow U, Strzepek RF, Whittaker KA, Yu E, and Thomas MK. (2013). Marine phytoplankton temperature versus growth responses from polar to tropical waters - outcome of a scientific community-wide study. PLoS ONE 8(5): e63091.
  13. Berg M, Sutula M. 2015. Factors affecting the growth of cyanobacteria with special emphasis on the Sacramento-San Joaquin Delta. Southern California Coastal Water Research Project Technical Report 869.

Module 04

Module 04 portfolio check

Project 2

  • CATME final group assessment
    • Completion status:
    • Comments:
  • Project 2
    • Report (80%):
    • Participation (20%):

Abstract

Writing

Editing

  • I edited some parts of the abstract

Introduction

Literature research

Writing

Editing

Methods

Writing

Editing

Results

Anaysis

Figures

  • I interpreted some of the figures that are not included (the iTOL tree), for classes of bacteria not shown in the presented figures

Writing

Editing

Discussion

Literature research

  • I looked for papers to incorporate or mention in the discussion; looked for correlates with pH or temperature

Writing

  • I wrote part of the discussion, including future directions

Editing

Project_2

Abstract

Genes encoding key steps in the nitrogen cycle are well defined and provide a basis for functional anchor screening to determine their distribution across prokaryotic taxa. In this study, Saanich Inlet was used as a model ecosystem to investigate the metabolic coupling and symbiotic interactions that influence the nitrogen cycle in oxygen minimum zones. Water samples were collected at seven major depths spanning the oxycline, and genomic DNA and RNA were extracted and sequenced. The resulting reads were processed, assembled, and analyzed using the Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) pipeline to reconstruct the nitrogen cycle along defined redox gradients in Saanich Inlet. The narG gene was then investigated in detail. Its DNA levels were found to increase with depth while RNA levels decreased with depth. Proteobacteria contributed most narG DNA, while Actinobacteria contributed most RNA. Further, narG RNA abundance increased along with nitrite concentrations in the Inlet, but had the opposite relationship with nitrate concentrations; this was the expected result given that narG mediates the conversion of nitrate to nitrite. Overall, narG is responsible for a conversion taking place the beginning of the nitrogen cycle, and this provides reasoning for its presence at the given depths, while providing a key piece to exemplify evolutionary and environmental reasoning for the distributed metabolism seen in the nitrogen cycle.

Introduction

Oxygen minimum zones (OMZs) are regions in the ocean where dissolved oxygen concentrations fall below 20 \(\mu\)M (1). Due to global temperature increases and other effects caused by climate change, OMZs are increasing at a significant rate. Saanich Inlet is a seasonally anoxic fjord off the coast of British Columbia and a model ecosystem for studying the diversity and biochemical responses of microbial communities to the hypoxic environments commonly observed in OMZs (1, 2). In particular, Saanich Inlet has been used to model the metabolic coupling and symbiotic interactions that occur in OMZs (3). The inlet undergoes recurring cycles of water column stratification and deep water renewal, making it a useful model for studying microbial responses to changes in oceanic oxygenation levels (4). Increased levels of primary productivity in ocean surfaces during the spring season, along with the limited mixing which occurs between the basin and surface waters, both result in the development of an anoxic body of water with increasing depth in the Inlet (2). These anoxic regions become highly populated with chemolithoautotrophs, eventually leading to a decrease in aerobically respiring organisms found deeper within these zones. Past studies have shown that these types of metabolic shifts usually lead to a loss of nitrogen along with the production of notable greenhouse gases such as methane (CH4) and nitrous oxide (N2O) (1). Some species have also been shown to engage in sulfide (H2S) oxidation and nitrate (NO3-) reduction pathways within these zones (5).

The nitrogen cycle–the biogeochemical cycle by which nitrogenous compounds are interconverted between chemical forms for environmental circulation–consists mainly of nitrogen fixation, nitrification and denitrification, and has been catalyzed by microorganisms long before the appearance of humans (6). Even today, much of it is still dictated by the diverse microbial niches present in the surrounding environment. An example of this control can be seen in denitrification–the conversion of NO3- to nitrogen gas (N2)–carried out by denitrifying bacteria. Denitrication is a highly important process as it prevents the accumulation of nitrogen compounds to toxic levels that could lead to eutrophication, and maintains the homeostasis of nitrogen distribution between soil and atmosphere. Having an atmospheric residence time of approximately 1 billion years, nitrogen gas (N2) is highly inert and is not accessible for the synthesis of proteins and nucleic acids, though this problem is easily solved by the conversion of N2 to NH4+ through nitrogen fixation (6).

Recent literature has highlighted that microbial taxa do not necessarily integrate whole elemental cycles into their individual metabolic pathways, but often divide these reactions among the community so that consortia of taxa must rely on each other to fulfill their metabolic requirements (2). The aim of this project was to characterize the involvement of our gene of interest, narG, which is known for its role in the reduction of NO3- to NO2-, in the denitrification cycle.

Methods

Water samples were collected from Saanich Inlet at depths 10, 100, 120, 135, 150, 165, and 200m spanning the oxycline during Saanich Inlet Cruise 72. The samples were subsequently filtered through a 0.22 \(\mu\)m Sterivex filter to collect biomass and both genomic DNA and RNA were extracted from these samples. The extracted RNA was converted into cDNA. Both total genomic DNA and cDNA were used to construct the shotgun Illumina libraries. Sequencing data were generated on the Illumina HiSeq platform with 2x150bp technology. Further sampling and sequencing details can be found in A compendium of multi-omic sequence information from the Saanich Inlet water column by Hawkley et al. (2017).

The IMG/M pipeline was then used to process the resulting reads. Metapathways 2.5 was used to assemble and process the metagenomes. A beta version of Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) pipeline was used to reconstruct the nitrogen cycle along defined redox gradients in Saanich Inlet using Google Cloud services. iTOL version 4.0 was used to generate phylogenetic trees based on the processed data. The contig maps were loaded into R version 3.4.3. The Tidyverse, Cowplot and Phyloseq packages were loaded and used to complete the data analysis.

Results

Environment setup and Data Cleaning

Loading, parsing (initial data contains data for all genes - we are only interested in narG), renaming data. We only require the taxonomy, abundance, and query data.

Data manipulation into a single data frame

## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 8302
## rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
## 20, ...].

Warning above were ignored, because not all queries could be classified down to species level. These cells are filled as NA

Final data table

Table 1. Final data of narG gene

## # A tibble: 8,302 x 11
##    Depth_m Type  Abundance Domain  Phylum Class Order Family Genus Species
##      <dbl> <chr>     <dbl> <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>  
##  1    10.0 DNA        7.97 Bacter… <NA>   <NA>  <NA>  <NA>   <NA>  <NA>   
##  2    10.0 DNA        1.43 Bacter… Prote… Gamm… uncl… Candi… <NA>  <NA>   
##  3    10.0 DNA        1.59 Bacter… Prote… <NA>  <NA>  <NA>   <NA>  <NA>   
##  4    10.0 DNA        1.02 Bacter… Prote… Beta… Burk… Burkh… <NA>  <NA>   
##  5    10.0 DNA        1.45 Bacter… Prote… Gamm… uncl… Candi… <NA>  <NA>   
##  6    10.0 DNA        1.34 Bacter… Prote… Alph… Pela… envir… <NA>  <NA>   
##  7    10.0 DNA        1.64 Bacter… Prote… Gamm… uncl… sulfu… <NA>  <NA>   
##  8    10.0 DNA        1.65 Bacter… Prote… Gamm… uncl… Candi… <NA>  <NA>   
##  9    10.0 DNA        3.32 Bacter… Prote… Gamm… uncl… Candi… <NA>  <NA>   
## 10    10.0 DNA       NA    Bacter… Actin… Acti… Micr… Micro… <NA>  <NA>   
## # ... with 8,292 more rows, and 1 more variable: Query <chr>
1. How does the DNA abundance of narG gene differ with depth?

Figure 1 indicates that the DNA abundance of the narG gene has an overall increasing trend with depth. The DNA abundance of the narG gene increases gradually from 21.40 to 990.72 at depth from 10 m to 150 m. However, it declines to 249.21 at depth of 165 m, but increases again to 717.90 at depth of 200 m.

2. What taxa are responsible for narG? Are they the same with depth? With DNA versus RNA?

RNA abundance of the narG gene with depth does not match the trend of DNA abundance. Figure 2 shows that RNA abundance of the narG gene increases significantly from 10 m to 100 m, and reaches its second highest level of 3585.45 at 100 m. Interestingly, the RNA abundance fluctuates between a range of 2500-3700 at depths 100-135 m. After that, it decreases gradually until 200 m. On the other hand, RNA abundance of narG gene is much larger than the DNA abundance at depth of 100-165 m, while less narG RNA is found in the samples from depth of 10 m and 200 m.

3. What taxa are responsible for narG gene? Are they the same with depth? With DNA versus RNA?

As shown in Figure 3, Actinobacteria, candidatus Omnitrophica, Chlorobi, Euryarchaeota, metagenomes, Proteobacteria, and unclassified Bacteria are responsible for narG DNA. Among of them, Actinobacteria, candidatus Omnitrophica, Chlorobi, Euryarchaeota, and metagenomes have evenly distributed DNA abundance across depth above 100 m. However, the distribution of Proteobacterial DNA abundance with depth follows a similar trend with that of narG DNA abundance, in which abundance increases from depth of 10 m to 150 m, declines at depth of 150-165 m, and finally increases again at 200 m. On the contrary, DNA abundance of unclassified Bacteria is the same at depth below 165 m but increases slightly at 200 m.

Actinobacteria, Proteobacteria, and unclassified Bacteria make dominant contributions to narG RNA among the 11 phyla. Actinobacteria is the predominant phylum for narG RNA contribution at depth of 100 m and 120 m, but its RNA abundance decreases gradually at deeper water and eventually becomes undetectable at depth of 200 m. Both Proteobacteria and unclassified Bacteria have observed RNA at the seven different depth, although their RNA abundances are not distributed in the same trend with depth. Proteobacterial RNA abundances at depth of 150 m and 200 m are higher than that from other depths, while unclassified Bacteria have the lowest RNA abundance at 150 m and 200 m. Additionally, unclassified Bacteria have RNA abundance increasing with depth at shallow water and reaching the highest point at depth of 135 m.

## Warning: Removed 7265 rows containing missing values (geom_point).

4. How does the abundance of narG gene relate to nitrogen species in Saanich?

The narG gene has significant impacts on nitrite (NO2-) and nitrate (NO3-) concentration across depth in Saanich. It is responsible for the reduction of nitrate to nitrite. As a result, when the RNA abundance of narG gene remains at a high level at depths 100-135m (Figure 2), nitrate concentration stops increasing and declines dramatically from its highest point at depth of 100 m (Figure 4D). On the other hand, due to poor nitrification in deeper water, although RNA of narG gene is less abundant at depth above 150 m, nitrate concentration continues decreasing with depth until to 0 at depth of 165 m. Similarly, narG is also one of the key factors for the fluctuation of nitrite concentration at depth 100-165 m. The low ammonium concentration in depth of 10-135 m indicates a high level of nitrification in the shallow water (Figure 4A), which is responsible for the significant increase of nitrate and the decrease of nitrite at depth 10-100 m. However, nitrite concentration does not keep decreasing with depth above 100m, instead, it remains around 0.09 \(\mu\)M as a result of high nitrate reduction level caused by the abundant RNA of narG gene. Therefore, narG gene abundance is closely related to nitrite and nitrate concentration in Saanich.

Discussion

In anthropogenic times, NO3- levels have significantly increased in environmental water systems resulting from agricultural activities (7). With increases in NO3- entering these systems, the accumulation of nitrogen compounds above threshold levels can lead to eutrophication. Microbe-driven processes such as denitrification, are able to counteract this phenomenon by converting NO3- to N2. In this context, our gene of interest, narG, specifically mediates the first step of this pathway. As the alpha subunit and catalytic domain of the membrane-bound dissimilatory nitrate reductase, narG is responsible for the conversion of NO3- to NO2-. NO2- will become the substrate of nirK and nirS, and through subsequent steps and enzymes, eventually lead to the production of N2 (8). Interestingly, it has been found that the process of transporting nitrate into the cytoplasmic location of the active site of NarG is inhibited by oxygen, making denitrification an anaerobic process (9). As a result, it is expected that there is greater amounts of narG transcription in OMZs compared to surface level waters. From figure 2, peak transcription of narG is found at 100 m where oxygen levels are approaching OMZ conditions (Figure 5). Additionally, the large abundance of narG transcription at 100 m and deeper can be explained with the corresponding abundance of NO3- substrate. Many denitrifying facultative anaerobic microorganisms respire NO3- in the absence of oxygen. Denitrification is considered to be the highest energy-yielding respiration system in anoxic environments (10). Increases in NO3- at anoxic depth leads to the reduction of nitrate to nitrite in the cytoplasm by narG. It has been found that this process is coupled to the translocation of protons into the periplasm which directly contributes to a proton-motive force for energy conservation (8). Therefore, it is not surprising that narG transcription levels are near absent at oxic depths where oxygen can still be used as the preferential terminal electron acceptor (TEA), and abundant at anoxic depths where NO3- can be used as an alternative TEA.

Metabolisms in the nitrogen cycle is widely distributed, such that each reaction step is catalyzed by a specific enzyme from a different bacterial species carrying the corresponding genes. This distribution may be the consequence of the distinct energy and nutrient sources which are available in different ecosystems, as well as other mechanisms, namely horizontal gene transfer, and adaptive gene loss.

In aerobic environments like shallow water and topsoil, nitrification is the predominant process of the nitrogen cycle, due to the availability of ammonia and nitrite as energy sources for nitrifying bacteria and the inhibitive effect of oxygen on denitrifiers (11,12). On the contrary, denitrification occurs in anoxic environments including OMZs and marine sediments, where nitrate serves as energy and nutrient source and the limitation of oxygen on denitrification is minimum (8). Besides, some bacteria evolve additional mechanisms to protect nitrogen-related enzymes from unfavorable environments. For example, Trichodesmium spp. and Nodularia spp. adopt the strategy that nitrogen fixation is separated from photosynthesis, in order to minimize oxygen suppression on nitrogenases, an enzyme for nitrogen fixation (8). Therefore, environmental conditions is one of the possible reason for the distributed metabolisms.

Another potential explanation for metabolic distribution of the nitrogen cycle is horizontal gene transfer (HGT). Bacteria may acquire nitrogen-related genes from other species through HGT and take up the niche of corresponding reactions in the nitrogen cycle. For instance, ammonia monooxygenase genes, encoding the key enzyme required for ammonia oxidation, are observed widely distributed among different bacterial species (6). As a result, if a certain bacterial species responsible for ammonia oxidation became extinct in a specific ecosystem, other species, carrying ammonia monooxygenase genes and able to perform the process of ammonia oxidation, would take up the niche of the extinct one. On the other hand, the retention of horizontally transferred gene is largely driven by selective pressure on nutrients and energy, and thus different processes in the nitrogen cycle are widely distributed among microorganisms and ecosystems (6).

Gene loss may also make contributions to the wide distribution of metabolisms in the nitrogen cycle. A hypothesis, called as Black Queen Hypothesis (BQH), is proposed recently to explain the dependency of free-living microorganisms having metabolic gene loss on other species (13). In this hypothesis, two groups of bacterial species are defined: “helper” and “beneficiaries”. The hypothesis predicts that in a microbial community, “beneficiaries”, which is selected to lose a costly and dispensable functions, will obtain the function from a fraction of other individuals in the community, until the function is sufficient to support the whole community (13). Based on the hypothesis, it can be explained that why nitrogen fixation is performed by a small fraction of microorganisms in the ocean. The possible reasons may be that genes responsible for nitrogen fixation were selectively lost during evolution and fixed N was turned into a public good (13).

This study paves the way for future investigations into metabolic coupling and symbiotic interactions in the nitrogen cycle. It appears that narG levels are regulated by oxygen levels and availability of the NO3- substrate; however, the effect of other environmental factors has yet to be related with gene abundance. According to Palmer et al., in sediment communities, Actinobacterial narG dominates in extreme environments characterized by low pH or high temperature, while Proteobacterial narG is present in pH-neutral soils (15). Given that Actinobacteria and Proteobacteria dominated DNA and transcript expression of narG in an aquatic context, it would be useful to determine whether these abundances are due to pH, temperature, or other measurable geochemical parameters.

Additionally, as narG is inhibited by its own product, nitrite, it may be interesting to see how narG abundances vary with nirK and nirS abundances, which consume nitrite. Interestingly, previous studies have noted that nirS was dominated by \(\beta\)-Proteobacteria, while nirK was dominated by \(\alpha\)-Proteobacteria (14). Our study results also suggest narG is dominated in part by Proteobacteria, although more in depth analysis into different classes of Proteobacteria have not been attempted. A deeper analysis into the abundance of narG in different classes of Proteobacteria may be necessary to establish a correlation between how the abundances of nirK, nirS, and narG vary within the Proteobacteria phylum.

References

  1. Walsh DA, Zaikova E, Howes CG, Song YC, Wright JJ, Tringe SG, Hallam SJ. 2009. Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones. Science 326(5952): 578-582.
  2. Torres-Beltran M, Hawley AK, Capelle D, Zaikova E, Walsh DA., Mueller A, Finke J. 2017. A compendium of geochemical information from the Saanich Inlet water column. Nature scientific data 4(170159).
  3. Hawley AK, Torres-Beltran M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchek O, Gies EA, Fairley D, Malfatii SA, Norbeck AD, Brewer HM, Pasa-Tolic, L, del Rio TG, Suttle CA, Trige S, Hallam SJ. Data Descriptor: A compendium of multi-omic sequence information from the Saanich Inlet water column. Nature scientific data 4(170160).
  4. Hallam SJ, Torres-Beltran M, Hawley AK. Comment: Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Nature scientific data 4(170158).
  5. National Research Council. 2007. The new science of metagenomics: revealing the secrets of our microbial planet. National Academies Press, Washington, DC.
  6. Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science 320(5879): 1034-1039.
  7. Smith CJ, Nedwell DB, Dong LF, Osborn AM. 2007. Diversity and Abundance of Nitrate Reductase Genes (narG and napA), Nitrite Reductase Genes (nirS and nrfA), and Their Transcripts in Estuarine Sediments. Applied and environmental microbiology, 73(11): 3612-3622.
  8. Kuypers MM., Marchant, HK, Kartal B. (2018). The microbial nitrogen-cycling network. Nature Reviews Microbiology.
  9. Moreno-Vivian C, Cabello P, Martinez-Luque M, Blasco R, Castillo F. (1999). Prokaryotic nitrate reduction: molecular properties and functional distinction among bacterial nitrate reductases. Journal of bacteriology, 181(21): 6573-6584.
  10. Strohm TO, Griffin B, Zumft WG, Schink B. (2007). Growth yields in bacterial denitrification and nitrate ammonification. Applied and environmental microbiology, 73(5): 1420-1424.
  11. Ward BB. 2008. Nitrification, p 2511-2518. In Jorgensen SE and Fath BD(ed), Ecological Processes, vol. 3 of Encyclopedia of Ecology, 5 vols. Oxford: Elsevier.
  12. Nakajima M, Hayamize T, Nishimura H. 1983. Effect of oxygen concentration on the rates of denitrification and denitrification in the sediments of an eutrophic lake. Water Resources, vol. 18, no. 3, p 335-337.
  13. Morris JJ, Lenski RE, Zinser ER. 2012. The black queen hypothesis: Evolution of dependencies through adaptive gene loss. mBio 3(2). DOI: 10.1128/mBio.00036-12.
  14. Yu Z, Yang J, Liu L. 2014. Denitrifier Community in the Oxygen Minimum Zone of a Subtropical Deep Reservoir. PLoS One 9(3): e92055.
  15. Palmer K, Horn MA. 2015. Denitrification Activity of a Remarkably Diverse Fen Denitrifier Community in Finnish Lapland Is N-Oxide Limited. PLoS One 10(4): e0123123.